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ABSTRACT 


'An  nnalyzer  of  COBOL  program  which  computes  the  metrics  from  software 
science  ia  described.  The  report  discusses  the  overall  design  of  the 
analyzer,  including  detailed  descriptions  of  each  of  its  modules.  It  also 
contains  instructions  for  the  use  and  maintenance  of  the  analyzer  at  Ohio 
State  University* 


PREFACE 


This  report  is  the  result  of  research  supported  in  part  by  the  U.  S. 
Army  Research  Office  of  Scientific  Research  under  contract 
DAAG29-80-k-0061 .  It  ia  being  published  by  the  Computer  and  Information 
Science  Research  Center  (CISRC)  of  the  Ohio  State  University  in  conjunction 
with  the  Department  of  Computer  and  Information  Science.  CISRC  is  an 
interdisciplinary  research  organization  whicn  consists  of  the  staff, 
graduate  students,  and  faculty  of  many  University  departments  and 
laboratories . 
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CHAPTER  1 


INTRODUCTION 

It  is  a  major  theory  o£  software  science  (Halstead  77)  that  if  we 
divide  the  basic  elements  of  a  given  program  into  operators  and  operands 
according  to  a  proposed  counting  strategy,  the  statistics  of  these 
operators  and  operands  exhibit  some  interesting  relationships  to  aspects  of 
software  quality.  It  is  hopeful  that  these  relationships  can  form  a 
quantitative  basis  for  the  analysis  of  software.  In  order  for  this 
approach  to  gain  widespread  acceptance,  it  is  necessary  for  these 
relationships  to  be  validated  on  different  classes  of  programs. 

Halstead  (Halstead  79),  in  a  treatise  of  software  science  research, 
cites  many  studies  which  have  sought  to  validate  these  relationships  on 
programs  written  in  many  languages.  Interestingly,  though,  none  of  these 
analyses  involve  COBOL!  More  recently,  (Zweben  and  Fung  79)  reported  on 
the  results  of  a  preliminary  study  of  COBOL  programs  which  were  counted 
manually.  However,  in  order  to  gather  large  amounts  of  data  on  COBOL 
programs  it  is  necessary  to  be  able  to  count  the  operators  and  operands  of 
COBOL  programs  mechanically.  The  computer  program  (or  analyzer,  as  it  acts 
almost  like  a  lexical  analyzer  for  COBOL)  described  in  this  report  is  the 
result  of  such  an  effort,  undertaken  at  The  Ohio  State  University,  to 
streamline  the  counting  process  for  the  study  of  software  science  metrics. 


It  should  be  mentioned  that  a  Software  Science  research  group  at 
Purdue  University  has  developed  another  COBOL  analyzer  (Shen  and  Dunsmore 
80).  The  Purdue  University  analyzer  is  written  in  COBOL  whereas  the  Ohio 
State  University  (OSU)  analyzer  is  written  in  PL/I.  Both  the  analyzers  are 
capable  of  handling  Data  as  well  as  Procedure  divisions  of  a  COBOL  program, 
as  has  been  recommended  by  (Zweben  &  Fung  79).  In  addition,  both  programs 
allow  users  the  option  of  providing  their  own  definition  of  COBOL  operators 
and  operands.  In  particular,  the  OSU  analyzer  offers  the  added  feature  of 
context  sensitive  counting  of  various  keywords,  as  will  be  discussed  in 
this  report. 

The  next  chapter  describes  the  overall  design  of  the  OSU  analyzer, 
and  is  intended  for  readers  who  may  wish  to  study  and/or  modify  the  actual 
programming  details.  Chapter  3  provides  the  details  of  how  to  use  the 
analyzer,  either  in  "default”  mode  using  a  predefined  counting  strategy,  or 
with  user  provided  definition  of  COBOL  operators  and  operands.  A  brief 
discussion  about  the  future  work  to  be  done  using  this  analyzer  has  been 
outlined  in  Chapter  4.  For  completeness  of  the  report,  four  appendices  are 
included.  Appendix-A  provides  the  explicit  design  document  for  the  entire 
analyzer  program.  A  detailed  description  of  the  existing  files  related  to 
the  analyzer  is  given  in  Appendix-B.  In  Appendix-C,  a  short  procedure  for 
maintaining  the  analyzer  is  mentioned.  Finally,  the  procedure  for  invoking 
Purdue's  analyzer  at  OSU  has  been  explained  in  Appendix-D. 


CHAPTER  2 


OVERVIEW  OF  THE  DESIGH  OF  THE  COl>OL 
AilALYER 


2.1:  Program  Structure 


The  strucure  o£  the  analyzer  is  based  of  the  data  s 
coken  stream  of  a  COBOL  source  program,  which  can  oe  pict 
(using  a  Jackson  design  notation  [Jackson,  75]). 


IJonreserved  words  are  programmer  defined  symbols  and  in  general  all 
nonreserved  words  except  paragraph  names  are  operands.  Paragraph  names  are 
identified  by  their  location  (beginning  in  Col.  8)  in  a  source  language 
statement.  Reserved  words  are  language  (compiler)  defined  symbols  and  they 
are  usually  well  documented  in  the  language  manual.  Most  COBOL  reserved 
words  function  as  operators  but  some  function  as  optional  symbols  to  make 
the  sentence  structure  more  cnglish  like.  In  the  usual  counting  strategy 
of  software  science,  these  optional  symbols  are  ignored  and  are  thus  called 
no isewords .  Some  operators  are  context  sensitive  and  additional  actions 
are  needed  to  identify  and  process  them.  Since  reserved  \;ords  can  be 
defined  precisely  with  the  help  of  a  language  manual,  operands  (nonreserved 
words)  of  a  given  source  program  can  be  identified  by  checking  against  the 
list  of  reserved  words. 


The  analyzer  can  be  viewed  as  having  the  following  four  major  phases 


The  data  structures  used  as  input /output  to  each  of  the  phases  are  the 
following. 

1.  Predefined  operator  and  noisevord  files. 

2.  List  and  Tree  structures  of  operators/noisewords . 

3.  COBOL  source  program 

(a)  Token  list  of  the  COBOL  source  program 

(b)  Operator/Operand  tree  for  the  data  and  procedure 
divisions  of  the  program. 

4.  Report  of  the  Softvare  Science  metrics. 

Initially,  information  about  reserved  words  is  used  to  build  operator 
and  noisevord  trees  (module  INITIAL  &  INITIA2).  This  information  consists 
of  names  of  reserved  words,  alternative  forms  of  these  words  (synonyms), 
and  any  context  sensitive  information  concerning  their  use  as  operators. 

The  input  character  stream  of  the  source  program  is  then  broken  down  into  a 
token  stream  (module  BLDTKN  &  GET0KE5 ) .  Every  token  is  firBt  processed 
against  the  operator  tree  (module  OPERATR  &  FILTER1).  If  it  is  not  an 
operator,  or  if  the  operator  tree  provides  no  synonym  or  context  sensitive 
information  to  determine  how  to  deal  with  the  token,  it  is  processed 
against  the  noisevord  tree  (module  NOISE  &  FILTER 2 ) .  If  it  turns  out  to  be 
oooe  of  the  above,  then  the  token  is  considered  to  be  an  operand  and  th? 3 
information  is  entered  into  the  operand  tree  (module  OPERAND  4  FILTER3). 
After  all  the  tokens  are  processed  and  the  information  concerning  their 
classification  is  built  into  the  appropriate  trees,  a  module  STAT  is 
invoked  to  generate  all  the  relevant  statistics  and  the  final  output. 


Note  that  two  modules  exist  for  each  process  other  than  the  statistics 
generation.  One  module  performs  the  process  for  the  DATA  division  and  the 
other  performs  the  corresponding  process  for  PROCEDURE  division. 

The  program  structure  is  best  described  by  the  following  high  level 
routine  (used  in  the  program  though  details  have  been  deleted  here  for 
clarity) . 

INITIALIZE  DATA  STRUCTURES  .  /*  Call  INITIAL  and  INITIA2  */ 

DO  WHILE  (~i  END  OF  PROGRAM  ); 

capture  a  token  ;  /*  Cali  getoken  or  bldtkn  */ 

COMPARE  THE  TOKEN  AGAINST  THE  DEFAULT  OPERATOR  TREE  ;  /*  Call  OPERATR  or  FILTER! 

IF  THE  TOKEN  IS  AN  OPERATOR 

THEN 

PROCESS  THE  TOKEN  AS  OPERATOR  ; 

ELSE 

COMPARE  THE  TOKEN  AGAINST  THE  NOISE  TREE  ;  /*  Call  NOISE  or  FILTER2  */ 

IF  THE  TOKEN  IS  (-1  OPERATOR  AND  -i  NOISEWORD  ) 

THEN 

PROCESS  THE  TOKEN  AS  AN  OPERAND  ;  /*  Call  OPERAND  or  FILTER3  */ 

END  ; 

PRODUCE  THE  ANALYZER  REPORT  ;  /*  Call  STAT  */ 

A  characterization  of  the  maj  )r  data  structures  used  by  the  program,  a 
structure  chart,  and  a  description  of  major  modules  are  given  in  the 
remaining  sections. 


2.2  Internal  Data  Structures 


Functionally  there  are  six  linked  list  structures  in  this  program. 
One  uses  a  binary  tree  structure,  four  use  a  linear  list  structure  and  the 
remaining  one  uses  a  circular  linked  list  structure. 


Binary  Search  Trees 

There  are  three  important  examples  of  the  binary  tree  node  structure 
in  the  analyzer.  These  are  the  operator,  noise,  and  operand  trees. 

Operator  tree 


Rootl 


ADD 


END 


The  nodes  in  a  tree  are  related  to  each  other  in  lexicographic  order 


of  the  symbol  with  respect  to  the  alphabet  set  used  by  the  local  computer 
(e.g.  EBCDIC).  Each  node  contains  information  about  the  frequency 
occurence  of  its  token  at  the  current  point  of  the  analysis  (see  the  node 
structure  below).  After  the  trees  are  completed  it  is  a  simple  matter  to 
derive  the  software  science  measures  ETA1 ,  VI,  ETA2 ,  N2,  and  the  frequency 
distribution  of  the  operators  and  operands  (see  STAT  under  Section  2.4). 


pointer 


In  the  analyzer  the  PL/ 1  declaration  for  this  structure  is: 


DCL  01  TREENODE 
02  LPTR 
02  RPTR 
02  ASSOCIATE 
02  TKNPRT 
02  FREQCNTR 
02  DLMFLAG 


BASED  (current), 
POINTER, 
POINTER, 
POINTER, 
POINTER, 
POINTER, 
BIT  (1); 


In  general,  left-link  and  right-link  help  to  define  the  tree 
structure  mentioned  above;  token  pointer  points  to  an  internal  list 
structure  of  the  token  associated  with  the  node  (token  list  described 
below);  associate  pointer  leads  to  an  associate  list  which  contains 
context  sensitive  information  about  the  token,  if  there  is  any; 
freq-counter  contains  the  occurrence  frequency  of  the  token;  and  DLM 
(delimiter)  is  a  flag  used  for  a  variety  of  purposes,  one  of  which  is 
to  indicate  whether  a  particular  operator  is  a  COBOL  verb. 


Token  List 


The  tokens  are  represented  internally  by  a  linear  list  structure 
Each  node  contains  two  adjacent  characters  of  a  token.  Tn  general  a 
token  of  length  x  requires  (x/ 2+1)  nodes  for  its  internal 
representation,  and  the  token  pointer  of  a  tree  node  points  to  the 
first  node  of  this  list. 


Synonym  List 

Synonymous  tokens  are  linked  up  through  the  synonym  pointer  (3rd 
field)  in  the  first  node  of  their  token  list. 


token  pointer 

V  ^ 


synonym  .  ~~ 


kS3Uni 

BgSBE 


I53P5C! 


■nTVJc' 


mane 


Associate  List 


The  associate  list  is  used  to  record  context  sensitive  counting 
strategy  rules  for  COBOL  keyword  tokens.  These  rules  are  given  as 
"action  pairs"  in  the  input  instruction  list  (see  section  3.2  for  the 
syntax  and  semantics  of  action  pairs).  A  token  affecting  a  previous 
token  or  being  affected  by  a  previous  token  has  an  associate  ^st  node 
linked  to  its  tree  node.  The  structure  of  an  associate  list  node  is 
similar  to  that  of  a  tree  node.  If  a  token  is  sensitive  to  more  than 
one  previous  token,  additional  associate  list  nodes  are  linked  through 
the  associate  pointer  field. 

As  an  example,  let  us  see  how  the  data  structure  of  an  associate 
list  corresponds  to  the.  description  of  the  context  sensitive 
relationship  among  the  tokens  'ERROR',  'ON'  and  'SIZE'. 

Our  counting  strategy  suggests  that  in  the  context  of  'ON  SIZE 
ERROR',  'ON'  and  'SIZE'  should  be  counted  as  noisewords,  and  the  token 
'ERROR'  representing  this  string  should  be  counted  as  an  operator. 

This  can  be  expressed  as  the  input  instruction  'ERROR  >  SIZE  >>  ON'  in 
the  file  0PER1  (see  section  3.2  for  a  discussion  of  the  syntax  of 
these  input  instructions).  To  implement  this  instruction  (which  is  an 
abbreviation  of  the  two  action  pairs  'ERROR  >  SIZE'  and  'ERROR  >>  ON') 
we  build  the  associate  list  as  follows. 


ERROR  >  SIZZ  »  OH 


L 


The  leftmost  node  in  this  example  is  a  tree  node  for  the  token 


1- 

'ERROR'  and  has  a  frequency  count  of  15  (arbitrarily  chosen).  The 
other  two  nodes  in  the  example  are  not  tree  nodes,  but  are  members  of 
the  associate  list  for  the  token  'ERROR'.  For  the  'SIZE'  node,  the  1 
in  the  frequency  field  denotes  that  in  the  case  of  'ERROR'  following 
'SIZE'  by  one  token,  'SIZE'  is  considered  as  a  noiseword  (positive 
value  associate  with  >).  The  2  in  the  frequency  field  of  'ON'  denotes 
that  in  case  of  'ERROR'  following  'ON'  by  two  tokens,  'ON'  should  be 
considered  as  the  noiseword. 

Historical  List 

This  is  a  circular  list  which  saves  the  last  ten  tokens  for 
backpatching  the  frequency  count  of  any  context  sensitive  token. 


MOVE 


Special  Character  List 

This  linear  list  links  up  all  special  break-characters  which  are 
also  considered  as  tokens.  This  list  is  consulted  by  every  incoming 
character  of  the  source  program.  The  second  character  field  of  the 
token  node  contains  a  ' or  '?'  depending  on  whether  the  character  is 
an  operator  or  noiseword.  This  list  built  as  the  (user  defined  or 
default)  operator  and  noiseword  files  are  read  into  the  analyer. 

Single  character  tokens  from  these  files  are  special  break  characters. 
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Fig.  2.3b 


Note:  INITIA2  has  Che  same  structure  and  subroutines  as  INITIAL, 


COUNT  PI> 


INTI 

SuBI 


STAT 


su&rout\ne 


2.4  Description  of  Major  Modules*: 


TNTTAi.  ( TNTTIA2 ) :  Initializes  the  fundamental  data  structures  needed  for 
analyzing  the  Procedure  (Data)  division  of  a  COBOL  program.  In  particular, 
this  module  builds  operator  and  noiseword  trees  according  to  the  input 
instructions  supplied  by  user  or  the  default  file  descriptions.  It  also 
constructs  the  token,  associate  and  synonym  lists. 

INITBLD:  Captures  a  token  from  the  input  instruction  stream  and  builds  a 
linked  list  structure  for  the  token. 

SCRNKWD:  Screens  the  keyword  (token)  captured  by  INITBLD.  If  it  is  a  nev 
member  of  the  operator/ noiseword  tree  then  it  is  inserted  into  the 
appropriate  tree.  Otherwise  the  linked  list  structure  of  the  token  is 
freed. 

ACTION:  Processes  the  action  pairs  (see  Section  3.2)  fr^  s'ae  input 
instructions .  These  action  pairs  define  synonyms  and  context  sensitive 
information. 

COUNTDD  (COUNTPD):  Scans  the  Data  (Procedure)  division  of  a  COBOL  program. 
Each  occurrence  of  a  token  is  classified  on  the  basis  of  the  counting 
strategy  defined  in  INITIA2  (INITIAL),  and  the  frequency  count  of  each 
unique  operator  or  operand  is  updated. 

GETNBUC:  Reads  an  input  record  of  the  COBOL  source  program  starting  from 
column  8  and  returns  the  first  nonblank  character  in  the  record. 

*  A  more  thorough  description  of  each  module  is  contained  in  Appendix-A. 


GETOKEN  (BLDTOKEN):  Scans  Che  records  in  Data  (Procedure)  division  of  a 
COBOL  source  program,  and  captures  tokens  from  the  input  character  stream. 
It  also  builds  the  linked  list  structure  for  the  tokens. 

This  routine  is  more  powerful  than  INITBLD  in  that  it  can  capture  a 
literal  string  as  a  token  and  it  can  determine  the  function  of  a  variety  of 
break -characters .  For  example,  may  be  used  as  a  hyphen  or  as  the 
subtraction  operator,  and  may  be  treated  as  a  decimal  point  in  a  real 
number  or  as  a  delimiter. 

OPERATE  (FILTER1):  Processes  the  incoming  token  in  Data  (Procedure) 
division  against  the  operator  tree  of  Data  (Procedure)  division. 

If  a  match  is  found  in  the  operator  tree  then  increments  its 
frequency  count  and  frees  the  storage  of  the  incoming  token.  Otherwise 
passes  the  token  to  NOISE  (FILTER2). 

NOISE  (FILTER2):  Checks  the  incoming  token  against  the  noiseword  tree  of 
the  Data  (Procedure)  division.  If  a  match  is  realized  in  the  noise  tree 
then  frees  the  current  token.  Otherwise  passes  it  down  to  next  phase, 
namely  OPERAND  (FILTER3). 

OPERAND  (FILTKR3 ) :  The  incoming  token  must  be  an  operand  by  default.  This 
module  searches  the  operand  tree  of  Data  (Procedure)  division  in  order  to 
see  if  a  match  exists.  If  a  match  is  found  then  increments  the  frequency 
count  of  the  current  token  and  frees  the  token.  Otherwise  inserts  the  new 
token  into  the  operand  tree  of  the  Data  (Procedure)  division  and  updates 
its  frequency  count. 


STAT:  Produces  a  report  of  the  analysis.  STAT  makes  use  of  the  operator 
and  the  operand  trees  developed  during  the  early  stages.  In  traversing  the 
operator  tree,  the  number  of  unique  operators  (ETA1)  is  obtained  by 
counting  the  tree  nodes  having  nonzero  frequency.  The  total  number  of 
operands  (Nl)  is  found  by  adding  up  the  values  of  all  the  frequency  counter 
fields.  Similar  treatment  is  followed  for  operand  tree  in  order  to  find 
the  number  of  unique  operands  and  also  the  total  number  of  operands  (i.e. 
ETA 2  and  N2  respectively).  This  module  also  produces  the  frequency 
distribution  of  all  the  operators  and  operands  in  different  divisions  of  a 
COBOL  program  in  a  sorted  order. 

REORDER:  Sorts  the  operator  and  operand  trees  of  the  Procedure  division  as 

well  as  the  operator  tree  of  the  Data  division  in  order  of  frequency 
counts.  It  also  provides  information  about  ETA1 ,  N1 ,  ETA2,  N2  and  the 
number  of  statements  in  the  appropriate  division. 

S0RT3 :  This  module  is  a  slightly  modified  version  of  REORDER,  and  is  used 
to  sort  the  operand  tree  of  the  Data  division.  In  addition  to  the  number 
of  unique  operands  and  the  total  number  of  operands  in  the  Data  division, 
it  also  calculates  the  number  of  common  operands  between  the  operand  trees 
of  the  Data  and  Procedure  divisions.  This  number  of  common  elements  is 
used  to  find  the  number  of  unique  operands  in  the  entire  program,  since  the 
number  of  unique,  operands  in  the  whole  program  ■  (sum  of  the  operands  in 
both  Data  and  Procedure  divisons)  -  (number  of  consaon  operands  between  the 
Data  and  Procedure  divisions). 


SOFTMET :  Produces  the  final  values  of  all  Software  Science  metrics  for 
Data  and  Procedure  divisions. 

PRNTWID:  This  module  generates  separate  files  containing  the  frequency 
distribution  of  all  the  tokens  in  both  Data  and  Procedure  division.  The 
file  is  broken  down  into  80  character  records,  and  is  composed  of  node 
units.  Each  node  unit  corresponds  to  a  token  and  consists  of  a  3-byte  node 
length,  a  4-byte  frequency  count,  and  the  token  symbol  of  up  to  256 
characters . 

REPORT:  Reads  three  different  files,  namely,  SYSUTO,  SYSUT1  and  SYSUT2 
generated  by  PRNTWID  and  STAT.  SYSUTO,  SYSUT1  and  SYSUT2  contain  all 
information  from  Data  division,  Procedure  division  and  the  program, 
respectively,  necessary  for  producing  the  desired  output.  It  generates  a 
report  of  Software  Science  metrics.  The  frequency  distribution  of  tokens 
in  Dan  .:u  Procedure  divisions  are  displayed  in  parallel.  Summary 
accounts  of  Data  division,  Procedure  division,  and  the  entire  program  are 
displayed  at  the  end  (see  section  3.3). 


CHAPTER  3 


USE  OF  THE  COBOL  ANALYZER 


la  order  Co  obtain  Software  Science  measures,  a  set  of  unambiguous 
rules  which  defines  the  partition  of  symbols  into  operators  and  operands  in 
a  language  is  required.  Different  authors  may  come  up  with  slightly 
different  rules  for  the  same  language,  even  if  the  same  language  compiler 
is  being  used.  The  discrepancies  are  often  due  to  disagreement  of 
interpretation  of  the  definitions  of  operator  and  operand  given  in 
(Halstead  77).  An  operand  is  defined  as  a  "variable  or  constant"  in  a 
program.  An  operator  is  defined  as  "a  symbol  or  combination  of  symbols 
that  affect  the  value  or  ordering  of  an  operand"  in  a  program. 

To  allow  for  changes  in  the  counting  procedure  by  different  users,  the  OSU 
analyzer  allows  the  option  of  invoking  a  predefined  set  of  rules  or 
defining  a  new  set  of  rules.  Section  3.1  describes  the  former  option,  and 


Section  3.2  discusses  the  latter. 


3.1  Default  Mode 


The  counting  strategy  used  in  this  analyzer  considers  entries  in  both 
Data  and  Procedure  Divisions.  It  is  governed  by  the  following  rules: 

1.  OPERANDS  (Data  and  Procedure  divisions). 

Any  reference  to  a  distinct  operand  is  counted  as  an  occurrence 
of  that  operand.  An  operand  is  any  of  the  following: 

a.  A  file-name,  e.g.,  CARD-INPUT-FILE. 

b.  An  identifier,  e.g.,  EMPLOYEE-NUMBER. 

c.  A  literal,  e.g.,  'BILL'  or  1234. 

A  paragraph  name  or  section  name  is  not  considered  as  an 
operand.  Together  with  PERFORM  or  GO  TO  it  is  considered  as  an 
operator. 


2.  OPERATORS  (Procedure  Division). 

Any  reference  to  a  distinct  operator  is  counted  as  an 
occurrence  of  that  operator.  An  operator  is  any  of  the  following: 

a.  A  logical  operator,  e.g.,  OR,  AND. 

b.  A  relational  operator,  e.g.,  *,  EQUAL,  LESS  THAN. 

c.  An  arithmetic  operator,  e.g., +  and  /. 

d.  A  key  word  or  required  word  in  a  valid  COBOL  statement  with  the 
exception  of  GO  TO,  PERFORM,  CALL  and  ALTER.  Any  group  of  keywords 
functioning  as  an  operator  is  counted  as  a  single  operator.  Examples 
of  such  keywords  or  keyword  combinations  are:  IF,  ELSE,  NEXT 
SENTENCE,  UNTIL,  AT  END,  READ,  and  OPEN. 
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e.  Noisewords  are  not  considered  as  operators,  and  are  ignored  in 
the  counting. 

f.  A  transfer  of  control:  Any  transfer  of  control  to  a  paragraph 
name,  section  name  or  subprogram  name  is  counted  an  occurrence  of  the 
operator  associated  with  that  name,  e.g.,  GO  TO  paragraph-1,  GO  TO 
paragraph-2,  and  PERFORM  paragraph-1  are  all  distinct  operators. 

g.  A  parenthesis  pair  e.g.,  A  -  (B  -  C)  has  one  occurrence  of  the 
operator  denoted  as  'parenthesis  pair'. 

3.  OPERATORS  (Data  Division). 

All  keywords /required  words  in  a  valid  COBOL  statement  are 
considered  as  operators  e.g.  FD,  BLOCK,  VALUE,  REDEFINES,  PICTURE 
etc. 

4.  Each  occurrence  of  a  COBOL  verb  adds  one  to  the  count  of  the 
operator  denoted  as,  'end  of  statement'. 

5.  Periods,  commas  and  semi-colons  are  not  counted. 

Complete  lists  of  the  operators  and  noisevords  for  both  Data  and 
Procedure  divisions,  based  on  ANSI  COBOL  1973  and  including  language 
extensions  from  IBM  OS  Version  4,  are  shown  in  appendix-B.  The  order  of 
the  records  is  so  as  to  create  well  balanced  tree  structures  for  the 
analyzer  (see  Chapter  2  for  details  concerning  these  structures),  with  an 
effort  to  put  more  frequently  occurring  entities  near  the  root  of  the  tree. 


Jfc 


! 


To  run  a  job  at  Ohio  State's  Instruction  and  Research  Computer  Center 
(IRCC)  using  default  mode,  the  following  sequence  of  inputs  is  required: 


//jobname  JOB  ...account  number...  I 

//PROCLIB  DD  DSN“TS06 13 .PROCLIB, DISP-SHR  |  ' 

//  EXEC  OSUCNTPM 
//SOURCE  DD  * 

.  ! 

(source  program) 


The  JCL  file  ,  OSUCNTPM,  used  to  run  the  analyzer  is  listed  in 
Append ix-B. 

It  should  be  noted  that  currently  the  students  of  different 
COBOL  courses  at  OSU  use  'VIDJET  and  WYLBUR  on-line  systems  to  run 
their  jobs  on  the  AMDAHL  470.  WIDJET  and  WYLBUR  users  utilize  the 
following  set  of  JCL  to  run  the  analyzer. 


//  JOB 

/* JOBPARM  V-D 

//PROCLIB  DD  DSN  -  TS06 18 .PROCLIB 
//  EXEC  INTERNAL 
/PROGRAM  DD  * 

$J0B  xxxxxx  student-name 


COBOL  SOURCE  PROGRAM 


SENTRY 

// 


Where  xxxxxx  corresponds  to  2-digit  LAB- ID  and  4-  digit  AUTHOR-ID 


Unlike  'OSUCNTFM' ,  this  JCL  allows  the  analyzer  to  gather  outputs  into 
a  separate  disk  file  (see  Appendix-C).  The  detailed  structure  of  LNTE2AL 
and  other  necessary  JCL  invoked  by  INTERNAL  are  given  in  Appendix-3. 

It  is  worth  mentioning  that  recently  an  EXEC  program  has  been 
developed  which  allows  students  to  run  the  analyzer  more  conveniently, 
while  keeping  tne  analyzer  secure  from  student  modii .cation.  The  entire 
EXEC  program  listing  is  also  included  in  Appen.iix-B  for  completeness. 
Because  of  the  present  facility,  the  students  n. ed  to  use  only  a  simple 
command  (called  ANALTZE)  to  run  their  program  through  the  analyzer  instead 
of  using  the  JCL  given  above. 


3.2  User  Defined  Operators  and  Operands 

It  is  also  possible  to  run  the  analyzer  with  user  defined  operators 
and  operands.  When  using  the  analyzer  in  this  mode  three  input  files  are 
required.  They  are  each  discussed  below. 

Source  Program 
DDname:  SOURCE 

This  file  contains  the  COBOL  program  to  be  analyzed.  Each  record  is 
80  bytes  long  and  corresponds  to  a  line  of  the  source  program.  The  size  of 
the  source  program  is  not  limited  by  the  analyzer  but  by  the  memory 
available  because  operand  storage  is  dynamically  allocated  and  freed. 
Currently  only  one  source  program  can  be  analyzed  in  one  execution. 

Operator  File 
DDname : OPER 

This  file  contains  the  definition  of  operator  for  the  counting 
strategy.  Each  record  is  80  bytes  long. 

The  syntax  of  an  operator  file  is: 

[.]  keyword-1  [relation  keyword-2] . 

The  period  before  keyword-1  is  optional.  Its  occurrence  indicates 
that  keyword-1  should  be  considered  as  a  COBOL  verb.  Since  a  COBOL 
statement  is  delimited  by  the  verb  of  the  next  statement,  counting  the 
verbs  of  COBOL  is  an  indirect  way  to  count  the  number  of  statements  in  the 


source  program. 


Keyword-1  is  a  symbol  to  be  considered  as  an  operator  in  tht  counting 
strategy.  If  keyword-1  has  been  registered  before,  it  is  not  registered 
again. 

The  'relation  keyword-2'  pair  is  optional.  Its  presence  indicates 
that  a  certain  action  is  to  be  performed  in  the  context  of  keyword-l  and 
keyword-2.  Multiple  action  pairs  signifies  multiple  actions  on  keyword-1. 
The  kind  of  action  to  take  place  is  defined  by  the  relation  of  the 
instruction  as  follows: 

'='  means  that  keyword-2  is  to  be  considered  as  a  synonym  of  keyword-1. 

They  are  to  be  treated  as  equivalent  tokens  in  the  counting  strategy. 

'>'  means  that  on  encountering  keyword-1,  if  the  most  recent  token  is 
keyword-2,  keyword-2  is  to  be  considered  as  a  noiseword. 

'<'  means  that  on  encountering  keyword-1,  if  the  most  recent  token  is 
keyword-2,  keyword-1  is  to  be  considered  as  a  noiseword. 

'>>'  means  that  on  encountering  keyword-1,  if  the  second  most  recent  token 
is  keyword-2,  then  keyword-2  is  to  be  considered  as  a  noiseword.  Up  to  six 
multiple  '>'  may  be  used,  implying  that  backtracking  by  six  operators  is 
possible. 

'<<'  means  that  on  encountering  keyword-1,  if  the  second  most  recent  token 
is  keyword-2,  then  keyword-1  is  to  be  considered  as  a  noiseword.  Up  to  six 
multiple  '<'  may  be  used. 

i 


One  easy  way  Co  recognize  which  keyword  is  to  be  considered  as  a 
noiseword  is  Co  look  aC  Che  'point'  of  Che  relation.  The  relational 
operator  always  points  at  the  noiseword. 

e.g.  Keyword-1  »»  keyword-2  «  keyword-3 

The  first  relation  points  at  keyword-2;  thus  keyword-2  is  to  be 
considered  as  a  noiseword.  But  the  second  relation  points  at  keyword-1; 
thus  in  the  context  of  keyword-1  and  keyword-3 ,  keyword-1  is  to  be 
considered  as  a  noiseword.  Note  that  this  input  record  does  not  define  any 
relationship  between  keyword-2  and  keyword-3. 

By  entering  into  the  operation  file  a  liat  of  relation  -  keyword 
pairs,  the  user  of  this  analyzer  may  define  the  operators  and  operands  of 
his  counting  strategy  based  on  information  obtained  from  the  language 
manual  supplied  by  the  vendor. 


I 

I 


Example  1 

Different  versions  of  a  COBOL  compiler  may  have  different 
repertoires  of  reserved  words.  The  counting  program  used  under  different 
COBOL  compilers  must  reflect  this  variation  through  entries  into  the 
operator  file.  For  example,  reserved  words  that  begin  with  the  letter  'b' 
unaer  two  popular  compilers  have  the  following  difference. 


1974  ANSI  COBOL 
before 
blank 
block 
bottom 
by 


1973  IBM  OS  Version  4 
basis 
before 
beginning 
blank 
block 
bottom 
by 


In  switching  from  ANSI  COBOL  to  the  compiler  for  IBM  Version  4,  two 
keywords,  'basis'  and  'beginning',  are  to  be  added  into  the  operator  file. 
That  is,  suppose  the  entries  for  reserved  words  that  begin  with  the  letter 
'b'  under  a  1974  ANSI/COBOL  compiler  are  (n  cards  existing  before  these 
entries ) : 


111111111 

column  123456789012345678... 


Card 

n+1 

before 

Card 

a+2 

BLANK 

Card 

n+3 

BLOCK 

Card 

n+4 

BOTTOM 

Card 

n+5 

BY 

Then  the  entries  under  a  1973  IBM  OS  Version  4  compiler  become: 


111111111 

column  123456789012345678... 


Card  n+1 
Card  n+2 
Card  n+3 
Card  n+4 
Card  n+5 
Card  n+6 
Card  n+7 


BEFORE 

BLANK 

BLOCK 

BOTTOM 

BY 

BASIS 

BEGINNING 


Example  2 

According  to  the  DEC-10  Version  4  COBOL  compiler,  the  use  of  the 
EXAMINE  verb  should  follow  the  general  format: 


ALL 

EXAMINE  identifier  TALLYING  LEADING  Literal-1 

UNTIL  FIRST 

[REPLACING  BY  literal-2i 


Underlined  capitalized  words  are  considered  as  keywords.  Capitalized 
words  which  are  not  underlined  are  considered  noisewords.  To  define 
lexical  units  according  to  the  above  general  format  the  following  cards  are 
entered  into  the  operator  file. 


.EXAMINE 
ALL  <  TALLYING 
FIRST  <  UNTIL 


BY  <  REPLACING 


The  period  before  'EXAMINE'  instructs  the  analyzer  to  consider  every 


occurrence  of  'EXAMINE'  aa  an  occurrence  of  an  operator  as  well  as  an 
occurrence  of  the  operator  '  end  of  statement'. 

'ALL  <  TALLYING'  instructs  the  analyzer  to  consider  'ALL'  as  a 
noisevord  if  its  preceding  word  is  'TALLYING' ;  otherwise  both  'ALL'  and 
'TALLYING'  are  to  be  considered  as  operators. 

'LEADING'  instructs  the  analyzer  to  register  'LEADING'  as  an  operator. 
However,  since  there  is  no  period  preceding  it,  no  occurrence  of  the  'end 
of  statement'  operator  is  registered. 

'FIRST  <  UNTIL'  and  'BY  <  REPLACING'  are  instructions  which  enable  the 
keyword  pointed  at  to  be  considered  as  operator  or  noiseword  according  to 
context,  similar  to  'ALL  <  TALLYING'. 


According  to  the  following  general  format, 

CORRESPONDING 

ADD  Identifier-1  TO  identifier-2 

CORR 

[ROUNDED]  [ON  SIZE  ERROR  imperative-statement ] 

the  operator  file  should  include  the  following  cards: 

.ADD 

CORRESPONDING  -  CORR 
TO 

ROUNDED 

ERROR  >  SIZE  »  ON 

The  period  before  'ADD'  is  to  instruct  the  analyzer  to  register  'ADD' 
as  a  COBOL  verb.  Thus  every  occurrence  of  'ADD'  increments  the  count  of 
both  'ADD'  and  'end  of  statement'. 

'CORRESPONDING  “  CORR'  instructs  the  analyzer  to  consider 
'CORRESPONDING'  and  'CORR'  as  equivalent  tokens. 

'TO'  and  'ROUNDED'  instructs  the  analyzer  to  register  these  two 
symbols  as  operators. 

'ERROR  >  SIZE  »  ON'  instructs  the  analyzer  to  ignore  'SIZE'  if  on 
encountering  'ERROR'  the  most  recent  token  is  'SIZE',  and  to  ignore  'ON'  if 
on  encountering  'ERROR'  its  second  most  recent  token  is  'ON'. 


Noisevord  file 


DDname: NOISE 

This  file  contains  all  the  individual  noisevords  which  are  not 
considered  as  operators  or  operands  at  an y  time  during  the  analysis.  Each 
individual  noisevord  is  placed  on  a  separate  input  record  (80  bytes  long). 
Some  noisevords  are  sensitive  to  context  and  are  considered  as  operators 
only  in  certain  situations.  For  example,  'TO'  is  a  noisevord  in  'EQUAL  TO' 
but  is  an  operator  in  'MOTE  X  TO  Y.'  Information  of  this  kind  should  be 
supplied  by  the  user  to  the  operator  file  (see  above  examples).  The 
noisevord  file  only  contains  symbols  vhich  are  always  considered  to  be 
noisevords.  This  set  of  'true'  noisevords  has  to  be  determined  before 
analysis  in  order  to  define  the  operands  of  the  counting  strategy.  Any 
token  vhich  is  not  contained  in  the  operator  file  or  the  noisevord  file 
will  be  categorized  as  an  operand. 

The  folloving  is  the  JCL  required  to  run  the  analyzer  at  Ohio  State 
University's  Instruction  and  Research  Computer  Center  if  user  defined 
operators  and  operands  are  employed: 


// job name  JOB  ...account  number... 

//  EXEC  PGM-OSUCNTPM 

//STEPLIB  DD  DSN“TSQ6 1 8 .LOADLIB , DISP“SHR 
/ / OPER  DD* 


(operator  file) 


I* 

//NOISE  DD* 


(noisevord  file) 


/* 

/ / SOURCE  DD* 


(source  program) 


/* 

//SYS  PRINT  DD  SYSOOT-A 

// 


3.3  Output  of  the  Analyzer 

The  following  outputs  are  generated  by  the  analyzer: 

-  Echoes  of  the  operator  file  and  noisevord  file 

-  List  of  tokens  scanned  during  the  analysis 

-  Frequency  distribution  of  operators  and  operands 

-  Number  of  unique  operators  (ETA1) 

-  Number  of  operator  occurrences  (Nl) 

-  Number  of  unique  operands  (ETA2) 

-  Number  of  operand  occurrences  (N2) 

-  Vocabulary  (ETA) 

-  Program  Length  (N) 

-  Estimated  program  Length  (NH) 

-  Total  Number  of  Statements  (NOS) 

Program  Volume  (V) 

-  Program  Level  (LH) 

-  Language  Level  (LAMBDA) 

-  Intelligence  Content  (IN TELL) 

-  Programming  Effort  (EFFORT) 

A  sample  of  the  actual  output  format  produced  by  the  analyzer  for  the 
frequency  distributions  and  metrics  summary  follows. 


CHAPTER  4 


SUMMARY 


The  present  form  of  the  analyzer  herein,  developed  by  the  software  metrics 
research  group  at  Ohio  State  University,  handles  operators  and  operands  in 
both  Data  and  Procedure  divisions  of  a  COBOL  program.  The  availability  of 
this  analyzer  makes  it  possible  to  collect  a  substantial  amount  of  data 
from  various  sources  of  COBOL  programs.  These  data  will  provide  the 
opportunity  for  more  extensive  and  critical  analysis  of  the  Software 
Science  metrics,  and  their  applicability  to  such  important  areas  as 
programming  time  prediction  and  error  prediction. 
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APPEND LX-A 
DESIGN  DOCUMENT 


This  appendix  provides  the  explicit  description  of  each  individual 
module  in  the  analyzer  program.  Included  for  each  module  is  a  detailed 
functional  description  (with  a  list  of  subprocesses)  and  its  interfaces 
with  the  other  modules  in  the  program. 


I 
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\  * 

t  i 

i 

■ 

Main  Module:  CJTTPGM 
Functional  Description: 

Thia  is  th«  control  nodule  of  the  analyzer. 

■ 

Subnroceaaea ; 

1.  Initialize  history  list,  namely  six  different  tree  roots. 

2.  Define  the  counting  strategy  for  both  data  and  procedure  divisions 

(INITIA2,  INITIAL) .  \ 

3.  Get  author~ID  and  program-ID  if  the  program  is  to  be  run  for  data 
collection. 

4.  Count  data  division  (COUNTDD). 

5.  Count  procedure  division  (COUNTPD). 

6.  Compute  software  metrics  and  print  a  report  (S1AT). 

Interface: 


out  m 

1.  1.  TOP-1  :  Top  of  Che  operator  tree  for  Procedure  division  (PD) 

2.  2.  TOP-2  :  Top  of  Che  noise  tree  for  PD. 

3.  3.  TOP-3  :  Top  of  Che  operand  tree  for  PD. 

4.  4.  TOP-6  :  Top  of  Che  operator  tree  for  data  division  (DD). 

5.  5.  TOP-7  :  Top  of  the  noise  tree  for  DD. 

6.  6.  TOP-8  :  Top  of  Che  operand  tree  for  DD. 

10.  Final  report  of  Che  analysis. 

7.  CIRPTR  :  Pointer  Co  last  token  of  Che  circular  list. 

8.  LAB- ID  :  2-digit  Lab- ID. 

9.  AUTH-ID  :  4-digit  author- ID. 

Module  :  INITIAL 
Functional  description: 

Keywords  and  action  pairs  in  the  operator  (OPER)  and  Noise  (NOISE) 
files  define  the  counting  strategy  of  Che  Procedure  division.  This 
subprogram  translates  the  counting  strategy  into  operator  and  noisevord 
trees  which  will  be  used  in  the  subsequent  analysis. 

Subprocesses : 

1.  Capture  a  keyword  from  the  input  record  of  OPER  file  and  put  into  a  list 
structure  (INIT2LD). 

2.  Examine  the  operator  tree  and  determine  if  the  keyword  has  been  linked 
into  the  tree.  If  it  is  not  already  present,  the  new  keyword  is  inserted 
into  the  tree  ( SCHNKWD ) . 


3.  Process  each  action  pair  after  the  keyword.  This  process  involves 
capturing  the  relationship  symbol  and  the  associated  keyword,  and 
constructing  the  associate  list  that  implements  the  relationships  between 
two  keywords  (ACTION). 

4.  Repeat  the  above  steps  with  NOISE  file. 


Interface: 


OUT  IN 

1.  1.  TOP  :  Root  of  a  given  tree  (Operator  or  Noiseword  tree  for  PD). 

2.  2.  TRIAL- PTR  :  Address  of  the  linked-list  structure  of  the 

captured  keyword/noiseword 

3.  SCRNKWD  :  Returned  tree  node  of  the  screened  keyword/noiseword 

4.  NEXT— CARD  :  Current  input  line  image 

3.  CURSOR  :  Pointer  to  the  current  character  in  NEXT-CARD 


Module:  INITIA2 
Functional  Deecriotion: 


This  subprogram  constructs  the  fundamental  data  structure  needed  to 
process  the  data  division  of  a  COBOL  program.  In  particular,  it  translates 
the  appropriate  counting  strategy  into  operator  and  noise  trees  for 
analyzing  the  data  division. 

INITIA2  *  as  exactly  similar  structural  and  functional  organization  as 
INITIAL  with  Che  exception  that  INITIA2  uses  the  operator  and  Noise  files 
for  data  division  whereas  INITIAL  uaes  the  operator  and  Noise  files  for 
Procedure  divisions  [see  Appendix-B].  The  detail  module  decription  for 
INITIA2  is  omitted  in  order  to  avoid  repetition. 


Module  :  IHETBLD 
Functional  Description: 

This  subprogram  picks  up  a  Coken  from  Che  input  inscruccioa  record 
and  conatruccs  a  linked  lisc  scruccure  for  chac  coken. 

Subnrocesa : 

1.  Assemble  a  Coken  sCring  by  collecting  one  character  at  a  time  until  a 
blank  at  end  of  card  is  reached. 

2.  Transform  Che  token  scring  inco  a  linked  lisc  scruccure. 

3.  Return  Che  address  of  the  linked  lisc  structure. 

Interface: 


OUT  2 

1 .  {TEXT -CARD  :  Current  input  line  image 

2.  CURSOR  :  Pointer  to  the  current  character  in  next-card. 


3.  TRIAL-PTR  :  Address  of  the  linked  list  structure  of  the  captured 
keyvord/nnisevord. 


Module.:  SC2HTWD 


Functional  Description: 

This  subprogram  determines  if  the  given  keyword/noiseword  captured  by 
OUTBID  is  a  new  member  or  not.  If  it  is  an  old  member,  the  storage  of  the 
given  keyword/noiseword  is  freed;  otherwise  it  is  inserted  into  the  tree. 

Subprocesses : 

1.  Traverse  the  appropriate  tree  to  determine  if  a  match  in  the  tree  can  be 
found  for  the  given  keyword/noiseword  (SDRCH). 

2.  If  a  match  exists,  the  storage  of  the  keyword/noiseword  is  freed 
(FBESTKN). 

3 .  If  a  match  cannot  be  found,  the  new  keyword/noiseword  is  linked  into  the 
tree  (INSERT). 

Interface: 


r 


odt  m 

1.  1.  TOP  :  Root  of  a  given  tree 

2.  2.  TRIAL-PTR  :  Address  to  the  linked  list  structure  of 

the  keyword/noiseword. 

3.  SURCH  :  returned  tree  node  of  the  matched  keyword/ 
noiseword  or  "null". 

4.  SAVE  :  tree  node  of  the  insertion 

5.  SC3NKWD  :  returned  tree  node  of  the  screened  keyword/noiseword. 

Module:  ACTION 
Functional  Description: 

This  subprogram  captures  the  relationship  symbol  (e.g.  »,*,  <)  and 
the  associated  keyword  of  the  action  pairs.  It  also  contructs  the 
associate-list  that  implements  the  relationship  between  the  keywords. 

Subprocesses : 

1.  Capture  the  first  character  of  the  relationship  symbol. 

2.  If  the  first  character  is  a  ,  process  the  next  keyword  as  a  synonym 
through  the  synonym  list. 

3.  If  the  first  character  is  a  process  the  next  keyword  as  a  context 

sensitive  element  through  the  associate-list. 

4.  If  the  first  character  is  a  process  the  next  keyword  as  a  context 


sensitive  element  through  the  associate- list 


Interface: 


out  m 

1.  1.  TOP:  Root  of  the  operator/noise  tree. 

2.  TRIAL-PTR:  Address  of  first  keyword  in  the  list 
structure. 

3.  3.  INITBLD:  Returned  address  to  list  structure  of  the 

associated  keyword. 


4.  SCEHKVD :  Returned  treenode  of  the  screened  keyword/noiseword 

3.  NEXT-CARD:  Current  input  line  image 

6.  CURSOR:  Pointer  to  the  current  character  in  NEXT-CARD 


This  subprogram  aurchaa  a  binary  era*  by  comparing  the  token 
character  string  associated  with  each  node  of  the  tree  until  a  match  is 
found  or  end  of  search  is  encountered. 


Subprocesaes : 


1.  Compare  the  given  character  string  with  the  string  associated  with  the 
tree  nodes. 

2.  If  both  the  strings  are  equal,  return  the  address  of  the  tree  node. 

3.  If  the  given  string  is  larger,  access  the  tree  node  on  the  left. 

4.  If  che  given  string  is  smaller,  access  the  node  on  the  right. 

5.  Repeat  the  above  steps  until  a  match  or  end  of  search  is  encountered. 


Interface: 
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out  m 

1.  TOP:  Root  of  a  given  tree 

2.  2.  TRIAL- PTH.:  Address  of  Che  given  coken  lisc. 

5.  COMPASS:  Returned  che  resale  of  comparison. 

3.  SURCH :  Returned  address  of  Che  cree  node  ChaC  matches  or  Cr ial-po inter . 

4.  HELP:  The  address  of  the  Cree  node  Co  be  compared. 


out  m 

1.  TOP:  Soot  of  a  given  tree. 

2.  2.  TOKEN:  Pointer  to  the  current  token  list. 

4.  COMPARE:  Returned  result  of  comparison. 

3.  HELP:  Pointer  to  the  tree  node  to  be  compared. 

5.  SAVE:  Tree  node  address  of  insertion 

Module:  COMPARE 
Functional  Description: 

This  subprogram  compares  the  character  string  contained  in  two  given 
token  list  structures.  It  returns  'GT"  if  the  first  character  string  is 
lexically  of  higher  order  than  the  second  one.  It  returns  'EQ'  if  they  are 
equal  and  returns  'LT"  if  the  first  string  is  of  lower  order  than  the 
second  one. 

Interface: 


OUT  IN 

1.  PTR-1 :  Pointer  address  to  first  character  string 

2.  PTR-2:  Pointer  address  to  second  character  string. 

3.  COMPARE:  returned  result  of  comparison. 


Module:  COUNTED 
Functiocel  Description: 

This  module  scans  Che  Data  division  of  a  COBOL  program.  Bach 
occurrence  of  a  Coken  is  classified  on  Che  basis  of  Che  councing  strategy 
defined  in  QJITIA2,  and  Che  frequency  councs  of  each  unique  operacor  and 
operand  is  updaced. 

Suborocesaes : 

1.  CapCure  a  token  from  the  input  scring  (GETOKEN) 

2.  The  current  coken  is  compared  with  che  elements  of  the  operator  tree.  If 
a  match  is  found,  Chen  the  current  token  is  processed  as  an  operator 
(OPERATE).  Otherwise  Che  token  is  passed  to  step3. 

3.  The  incoming  token  is  compared  with  Che  entries  of  the  noise  tree.  If  an 
identical  token  exists  in  Chis  tree,  Che  current  token  is  processed  as  a 
noisevord  (NOISE);  otherwise  the  token  is  passed  into  step4. 

4.  In  this  stage,  the  current  token  is  treated  as  an  operand  and  processes 
accordingly  (OPERAND). 

5.  Stepl  thru  step4  are  repeated  until  Che  end  of  the  Data  division. 


Interface: 


55 


OUT  IN 

1.  1.  TOP-6  :  Root  of  the  Operator  tree  for  DD 

2.  2.  TOP-7  :  Root  of  the  Noise  tree  for  DD 

3.  3.  TOP-8  :  Root  of  the  Operand  tree  for  DD 

4.  CIRPTR  :  Pointer  to  the  entry  of  history  list 

5.  5.  TOKEN  :  Pointer  to  the  current  token  list 

6.  RETURN-FLAG  :  Flag  returned  by  OPERATR  or  NOISE 
7 •  NEXT-CARD  :  Current  input  line  image 

8.  CURSOR  :  Index  to  current  character  in  NEXT-CARD 

9.  BUFF-CHAR  :  Buffer  for  the  current  token  string  in  process 

10.  BUFF-PTR  :  Index  to  current  character  in  buffer 


Module:  G2T0KEN 
Functional  Description: 

I 

l.  Find  the  first  nonblank  character  of  a  token 


2.  Capture  all  the  adjacent  characters  starting  with  the  first  character 
until  a  blank  is  encountered. 

3.  If  the  last  character  of  the  string  collected  in  step2  is  a  period,  then 
ignore  that  period. 

4.  Generate  a  linked  list  structure  for  this  new  token. 

5.  Repeat  step l  through  step4  for  the  entire  Data  division. 


Interface: 


1.  1.  NEXT-CARD  :  Current  input  line  usage 

2.  2.  CURSOR  :  Pointer  to  current  character  in  NEXT-CARD. 

3.  3.  BUFF-CHAD  :  Buffer  for  current  token  string. 

4.  4.  BUFF-PTR  :  Pointer  to  current  character  in  buffer. 

3.  3.  EAR  :  Current  character  in  process. 

6.  6.  TOKEN  :  Pointer  to  token  list. 


Module:  GETHBIX 

functional  Description: 
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This  module  finds  che  first  no  ob  lank  chsrsctsr  in  an  input  record. 

Suborocesses: 

1.  Check  every  character  of  the  input  string  (starting  froa  column  8  of  the 
input  record)  until  the  first  nonblank  character  is  encountered. 

2.  Return  the  current  no nb lank  character. 

Interface: 


out  nr 

1.  1.  NEXT-CARD  :  Current  input  line  image 

2.  2.  CURSOR  :  Pointer  to  current  character  in  NEXT-CARS. 

3.  3.  BUFF -CHAR  :  Buffer  for  current  token  string. 

4.  4,  BUFF-FTR  :  Pointer  to  current  character  in  buffer. 

5.  5.  EAR  :  Current  character  in  process. 


r 


Module:  GETCHAR 
_gunceioasl  Description; 

This  module  returns  the  next  relevant  character  in  the  input 
character  stream.  Characters  rn  comments  and  labels,  characters  before 
column  8  and  after  column  72  are  considered  irrelevent. 

Subprocesses : 

1.  Increment  cursor  by  1. 

2.  Read  in  another  card  when  cursor  is  equal  to  73. 

3 .  Skip  comments . 

4.  Skip  labels. 

5.  Return  the  character  to  buffer  and  check  for  the  overflow  of  the  buffer. 

Interface: 


OUT  IN 

1.  1.  NEXT-CA*^:  Current  input  line  image. 

2.  2.  CURSOR:  Pointer  to  the  current  character  in  NEXT-CARD. 

3.  3.  BUFFER:  Buffer  for  the  current  token  string  (may  be  partial). 

4.  4.  BUFF-PTR:  Pointer  to  the  last  character  of  the  token  string  in 

buffer. 


5. 


5.  KAR:  Current  character  in  process 


Module:  CU TOKEN 


Functional  Description: 

This  module  conatruct*  a  linked-list  structure  of  a  particular  token 
contained  in  the  buffer  area.  Each  node  of  the  linked  list  contains  only 
two  adjacent  characters  of  the  token. 

Suborocesses: 

1.  Allocate  a  node  and  initialize  all  fields  in  the  node.  Pointer  to  this 
node  is  returned  to  the  calling  module. 

2.  Pill  up  the  node  with  first  two  characters  of  the  buffer  area. 

3.  Stepl  and  Step2  are  repeated  until  all  characters  of  the  buffer  are 
included  in  the  list. 

4.  Empty  the  buffer  area. 

Interface: 


OUT  IN 

1.  BUFFER:  Buffer  for  the  currently  captured  token  string. 

2.  BUFF-PTR:  Pointer  to  the  last  character  of  the  above  token 
string. 

3.  TOKEN:  Pointer  to  the  token  list  constructed. 

4.  HELP:  A  temporary  pointer  which  helps  to  hold  the  address  of  the  last 
node  in  construction  of  the  list. 


Module;  RECEDE 


-)<j 

Functional  Description: 

In  processing  a  character  string  .  it  becomes  necessary  to  look  ahead 
one  or  more  characters.  This  module  enables  look  ahead  by  providing  a 
mechanism  to  recover  the  last  characters ) ,  if  necessary,  in  the  input 
3tring. 


Suborocesaes : 

1 .  Decrement  cursor  by  1 . 

2.  Decrement  buffer  pointer  (BUFF-PTR)  by  1. 


Interfaces : 


OUT  IN 

1.  1.  CURSOR:  Pointer  to  the  current  character  in  input  string. 

2.  2.  BUFF-PTR:  Pointer  to  the  last  character  of  the  token  string  in 

buffer. 


Module:  OPERATE. 

Functional  Description: 

This  function  subroutine  returns  "true”  if  a  match  of  the  current 
token  exists  in  the  operator  tree;  otherwise  returns  "false". 

Subprocesses : 

1.  Search  the  operator  tree  and  determine  if  the  current  token  is  an 
operator  by  comparing  the  current  token  with  the  members  of  the  operator 
tree. 

2.  If  the  token  is  an  operator,  then  increase  the  frequency  counts  of  the 
token  by  1  and  free  the  incoming  token  list  structure  since  only  one 
version  of  the  same  token  list  is  needed. 

3.. If  the  token  is  not  an  operator,  return  "false". 

Interface: 


OUT  IN 

1.  1.  TOP-6  :  Root  of  the  operator  tree  for  DD 

2.  2.  TOKEN  :  Pointer  to  token  list  in  process 

3.  STJ&CH  :  Returned  tree  node  address  of  the  matched  token  or 
"null". 


4.  RETURN-FLAG  :  Flag  containing  1  or  0  depending  on  whether  the  current 
token  exists  in  the  operator  tree  or  not. 


Module;  NOISE 


Function*!  Description: 

This  function  subroutine  returns  "true"  if  the  current  token  is  found 
to  be  *  noisevord.  Otherwise  it  returns  "false".  Noisevorda  are  ignored 
in  the  present  analysis. 

Suborocesses ; 

1.  Search  the  noise  tree  to  find  whether  the  current  token  is  a  noisevord, 
by  comparing  the  token  with  each  member  of  the  noise  tree. 

2.  If  the  token  is  found  to  be  a  noisevord  then  return  "true"  and  free  the 
current  token  list;  otherwise  return  "false". 

Interface: 


OUT  IN 

1.  1.  TOP-7  :  Root  of  the  noise  tree  for  DD 

2.  2.  TOKEN  :  Pointer  to  current  token  list 

3.  SURCH  :  Returned  tree  node  address  of  the  matched  token  or 
"NULL" 

4.  RETURN-FLAG  :  Flag  containing  "true"  or  "false”,  returned  by  NOISE. 
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Module:  OFEHAHD 
Functional  Description: 

This  nodule  treats  every  incoving  token  as  an  operand.  It  searches 
the  operand  tree  and  inserts  the  incoming  token  into  the  tree  only  if  it  is 
a  nev  member. 

Suboroceaaea : 

1.  Search  the  operand  tree  and  compare  the  incoming  or  current  token  vith 
the  members  of  the  tree. 

2.  If  the  current  token  is  found  to  be  a  nev  member,  then  insert  this  token 
into  the  operand  tree  and  update  the  frequency  count. 

3.  If  the  token  already  exists  in  the  operand  tree,  then  free  the  token 
list  and  increase  the  frequency  count  of  the  existing  member  by  1. 


Interface: 
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OUT  IK 

1.  1.  TOP-8  :  Root  of  Che  operand  tree  for  DD 

2.  2.  TOKEN  :  Pointer  to  current  token  list. 

3.  STJRCH  :  Returned  tree  node  address  of  the  matchedf  token  or 
"HULL". 

4.  SAVE  :  Tree  node  address  of  insertion. 


Module;  COUNTPD 
Functional  Description: 

It  scans  the  procedure  division  of  a  COBOL  program.  Each  occurrence 
of  a  token  is  classified  according  to  a  counting  strategy  defined  in 
INITIAL  and  the  frequency  count  of  each  unique  operator,  noisevord,  or 
operand  is  updated. 

Suborocesses : 

1.  Capture  a  token  from  the  input  (BLDTKN) 

2.  The  token  is  compared  with  entries  in  the  operator  tree.  If  a  match  is 
found,  the  token  is  processed  as  an  operator  (FILTER! ) .  Otherwise  continue 
to  the  next  step. 

3.  The  token  is  compared  with  entries  in  the  noise  tree.  If  a  match  is 
found,  the  token  is  processed  as  a  noisevord  (7ILTER2) .  Otherwise  proceed 
to  the  next  step. 

4.  The  token  is  processed  as  operand  (FILTER3). 

5.  Repeat  the  above  process  until  end  of  program 


Interface: 


OUT  m 

1.  1.  TOP-1  :  Root  of  the  operator  tree  for  Procedure  division 

2.  2.  TOP-2  :  Root  of  the  noiseword  tree  for  Procedure  division 

3.  3.  TOP-3  :  Root  of  the  operand  tree  for  Procedure  division 

4.  4.  CIRPTR  :  Pointer  to  entry  of  the  history  list 

6.  6.  TOP-8  :  Root  of  the  operand  tree  for  DD 

7.  DONE- FLAG  :  Flag  returned  by  Filter  1  or  Filter  2 

5.  TOKEN  :  Resultant  pointer  to  the  token  list  by  BLDTCft 


Module:  BLDTKN 
Functional  Description: 

It  scene  the  source  program  input  stream  and  captures  the  next  token 
on  the  basis  o£  a  set  of  delimeters  (e.g.  blank,  period,  comma, 
etc.)  defined  in  the  COBOL  language. 

Suborocesses : 

1.  If  the  first  nonblank  character  of  a  potential  token  string  is  a  period, 
comma,  or  right  parenthesis,  then  skip  the  character. 

2.  Capture  and  '**' 

3.  Capture  V','*' 

4.  Capture  literals  enclosed  by  quotes  (subroutine  LITERAL) 

5.  Capture  character  strings  delimited  on  the  right  by  blank, 

and  .  Embedding  or  is  allowed. 

6.  Character  strings  of  'CALL' , 'PERFORM' , 'ALTER' ,  'G°'  and  'NOTE'  are 
processed  separately  by  the  subroutine  named  CALL- IT,  PERFORM,  ALTER,  GOTO 
and  NOTE  respectively. 

Interfaces : 

NTTD 
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OUT 
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1.  TOP-1  :  operator  tree  for  nev  insertion  of  operator 

2.  TOKEN  :  pointer  to  token  liat 

3 .  CURSOR  :  index  to  current  character  in  NEXT-CARD 

4.  NEXT-CARD  :  current  input  line  image 

5.  BUFFER  :  Buffer  for  current  token  string  in  process 

6.  BUFFER-PTR  i  Index  to  current  character  in  Buffer 


7.  KAR  :  current  character 


8.  CIRPTR  :  Pointer  to  entry  of  the  history  list 


Module:  FILTER1 


Functional  Description: 


This  function  subroutine  returns  "true"  and  updates  the  frequency 
count  of  a  token  captured  by  BLDTKN  if  a  match  is  found  in  the  operator 
tree.  Otherwise  it  returns  "false"  for  passage  of  the  token  to  FILTER2. 


Subprocesses : 

1.  Search  the  operator  tree  to  determine  if  the  token  is  an  operator.  If 
it  is  not  an  operator  then  exit  this  function  and  return  "false". 

Otherwise  process  the  token  according  to  the  following  steps. 

2.  Increase  the  frequency  count  of  this  token  by  one,  and  free  the  incoming 
token  list  structure  since  only  one  version  of  the  same  token  list  is 
needed. 

3.  If  the  token  found  is  potentially  sensitive  to  past  tokens,  the  history 
list  containing  the  past  ten  operators  should  be  examined.  If  indeed  the 
token  can  affect  a  past  token  or  be  affected  by  a  past  token,  appropriate 
action  should  be  taken  to  address  this  situation. 

4.  Update  the  history  list  for  the  new  token. 

5.  Return  'true'  to  COUNTPD. 


Interface: 


our  nr 

1.  1.  TOP— I  :  Root  of  the  operator  tree 

2.  2.  TOKEN  :  Pointer  to  token  list  in  process 

4.  SURCH  :  Returned  tree  node  address  of  the  matched  token 
'nuLl'  for  non-match 

3.  FH.TER1  :  Returned  'true'  or  'false'  for  'confirmed  operator'  or 
'confirmed  non-operator'. 


0Lj 

Module:  FILTER2 
Functional  Description: 

This  function  subroutine  returns  "true"  if  a  natch  is  found  in  the 
noise  tree.  Since  keeping  counts  of  noisevords  serves  no  purpose  for  the 
present  analysis,  noisevords  are  not  counted. 

Subprocess : 

1.  Search  the  noise  tree  to  determine  if  current  token  is  a  noisevord.  If 
it  is  then  free  the  token  and  return  "true".  Otherwise  return  "false". 

Interfaces : 


OUT  IN 

1.  1.  TOP-2  :  Root  of  the  noise  tree 

2.  2.  TOKEN  :  Pointer  to  token  list  in  process 

4.  SURCH  :  Returned  tree  node  address  of  the  matched  token  or 
'null'  for  non-match. 

3.  FILTEK2  :  Returned  "true"  or  "false"  for  confirmed  noisevord  or 


confirmed  nonnoiseword 


Module:  FILTZR3 
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Functional  Description: 

Ic  erases  every  incoming  token  ss  an  operand.  Most  operands  have 
been  defined  in  the  operand  tree  for  the  Data  division.  Therefore  Che  DD 
operand  tree  is  searched  first  to  avoid  duplicate  representation  of  a 
unique  token.  The  PD  operand  tree  is  searched  next  and  only  nev  operands 
are  inserted  into  this  tree. 

Suboroces sea : 

1.  Search  the  DD  Operand  tree  to  determine  if  the  token  has  been  declared 
in  Data  division.  If  it  is,  free  the  token  list  and  use  the  one  already  in 
use. 


2.  Search  the  PD  operand  tree  to  determine  if  the  token  is  already  in  the 
tree  or  not.  If  it  is  not,  insert  the  token  into  the  tree. 

3.  Increase  the  frequency  count  by  1. 

Interface : 


1 .  1 .  TOP-3  :  Root  of  the  operand  tree 

2.  2.  TOKEN  :  Pointer  to  token  list 


3.  TOP-8:  Root  of  the  operand  tree  for  DD. 

4.  SURCH:  Returned  tree  node  address  or  "null"  for  unmatched  token. 

5.  SAVE:  Tree  node  address  of  insertion. 


r  ^ 

i  »  i 

Module:  STAI 
Functional  Description: 

|  This  nodule  produces  readable  printouts  for  ell  the  parameters  of 

software  science. 

Suborocesses : 

1.  Sort  the  operator  and  operand  trees  of  both  data  and  procedure  divisions 
so  chat  the  tokens  are  arranged  in  order  of  frequency  counts.  Also 
calculate  the  number  of  unique  operators /operands  and  the  total  number  of 

operators /operands  during  the  sort  process  (REORDER,  and  S0RT8)  / 

2.  Compute  the  values  of  all  software  science  metrics  ( SOFTMET ) 

3.  Generate  seperate  files  containing  the  detailed  information  (e.g. 
frequency  distribution  of  all  tokens,  summary  record  for  each  division)  for 
data  and  procedure  divisions,  as  well  as  for  the  whole  program  (PHHTWTD ) 

4.  Use  the  files  generated  in  step  3  to  print  out  the  appropriate  report 
(REPORT) 


OUT 


IN 


1.  1.  TOP-6:  Root  of  DO  operator  tree. 

2.  2.  TOP-1 :  Root  of  PD  operator  tree. 

3.  3.  TOP-3:  Root  of  PD  operand  tree. 

4.  4.  TOP-8:  Root  of  DD  operand  tree. 

5.  Lab-ID:  2-digit  Lab-ID  (required  only  for  student's  job) 

6.  AUTHOR- ID:  4-digit  author  ID  (required  only  for  students'  job) 

7.  7.  TOP-9:  Root  of  the  frequency  tree  of  DD  operator 

8.  8.  TOP-4:  Root  of  the  frequency  tree  of  PD  operator. 

9.  9.  TOP-5:  Root  of  the  frequency  tree  of  PD  operand 

10.  10.  UNIQUE:  Number  of  unique  operators /operand 

11.  11.  OCCURRENCE:  total  number  of  operators /operands . 

12.  12.  EOS:  Number  of  statements  in  data/procedure  division 

13.  13.  TOP-10 :  Root  of  the  frequency  tree  of  DD  operand. 

14.  ETA2- INTERSECT:  Common  elements  between  DD  and  PD  operand  trees 

15.  N:  Program  length 

16.  ETA:  7ocabulary 

17.  NH:  Estimated  program  length 

18.  V:  Program  volume  Software  Science 

19.  LH:  Program  level  Metrics 

20.  LAMBDAH:  Language  level 

21.  INTELL:  Intelligence  content 

22.  EFFORT:  Programming  effort 

23.  23.  SYSUTO:  File  containing  sussaary  record  and  frequency  counts  of 

Data  division. 

24.  24.  SYSUTI:  File  containing  summary  record  and  frequency  counts  of 

Procedure  division. 

26.  26.  FINAL  output  of  the  Analyzer 

25.  SYSUT2  :  File  containing  the  summary  record  for  the  whole  program 


Module :  REORDER 
Functional  Description: 

This  module  sorts  (according  to  frequency  counts)  the  operator  tree 
and  the  operand  tree  of  the  procedure  division  as  well  as  the  operator  tree 
of  the  data  division.  It  also  finds  the  number  of  unique  operators 
(operands)  and  total  number  of  operators  (operands)  while  sorting  the 
trees. 

Subnrocess ; 

1.  Traverse  the  operator /operand  trees  using  a  postorder  traversal  algorithm 

2.  The  visited  node  is  pruned  from  the  original  tree  and  is  inserted  into 
the  frequency  tree,  in  which  all  the  tokens  are  arranged  in  sequential 
order  of  frequency. 

Interface : 


OUT  IN 

1*  1.  ROOT:  Root  of  the  given  tree 

2.  2.  FREQ-TOP:  Root  of  the  frequency  tree 


3.  UNIQUE  :  #  of  unique  operators/operands 

4.  OCCURENCE:  total  #  of  operators/operands 

5.  EOS:  #  of  statement  in  Data  division/Procedure  division 


This  module  inserts  Che  Cree  nod*  carried  by  'root'  into  Che 
frequency  Cree  Copped  by  'Freq-top'.  The  insertion  procedure  of  this 


routine  differs  from  Chet  of  XHSEET  in  the  following  ways. 

(a)  The  sort  key  is  the  frequency  count  and  then  the  character  string  of 
the  token. 

(b)  The  inserted  unit  is  a  tree  node  instead  of  token  pointer. 

(c)  No  storage  allocation  is  needed.  In  other  words,  building  the 
frequency  tree  does  not  require  extra  storage. 

Suborocesses : 

1.  Use  Che  root  of  the  original  tree  as  the  top  of  the  frequency  tree. 

2.  If  the  frequency  count  of  the  next  vistited  nod*  is  greater  than  the 
frequency  count  of  the  frequency  top,  go  to  the  right  and  insert  the 
current  node  into  the  frequency  tree  if  right  pointer  is  null. 

3.  If  the  frequency  count  of  the  next  visited  node  is  less  than  that  of  the 
frequency  top  then  go  to  the  lefc  node.  If  the  left  pointer  is  null  then 
insert  the  current  node  into  the  frequency  tree. 

4.  If  the  frequency  count  of  the  next  visited  node  is  equal  to  the 
frequency  count  of  the  frequency  top,  then  insert  the  current  node 
depending  on  the  lexical  order  of  the  token  string*. 


1.  ROOT:  Root  o£  a  given  tree. 

S.  COMPARE:  Returned  the  result  of  comparison 
betveen  the  token  strings. 

2.  FREQ-TOP  :  Root  of  the  frequency  tree 

3.  HELP1  :  Pointer  to  character  string  1 

4.  HELP 2  :  Pointer  to  character  string  2 


Module  S0JLT8 
Function*!  Description: 


This  module  torts  the  operand  tr««  of  tha  Data  division.  It  also 
finds  tha  total  number  of  eoaoa  oparanda  batvaan  tha  Data  and  Procedure 
divisions . 

Subprocesses : 

1.  For  each  token  in  the  DD  operand  tree,  search  the  operand  tree  of  the 

procedure  division  to  find  if  tha  current  token  of  tha  DD  operand  tree  also 

exists  in  the  operand  tree  of  tha  Procedure  division. 

2.  If  the  current  DD  operand  also  exists  in  the  operand  tree  of  the 

Procedure  division,  then  the  number  of  coaon  tokens  is  incremented  by  1. 

3.  Stepl  and  Step2  are  carried  out  for  all  the  tokens  in  the  DD  operand 
tree. 

4.  Sort  the  DD  operand  tree  using  the  same  procedure  as  the  REORDER 
routine. 

Interface: 


OUT 


IB 


1.  1.  TOP-8  :  Boot  of  the  operand  tree  of  DD 

2.  2.  TOP-3  :  Boot  of  cha  operand  tree  of  PS 

4.  SUBCa  :  Returned  addreaa  of  the  matched  tree 
node  or  'null" 

3.  5.  TO P-10  :  Boot  of  the  frequency  tree  of  OS 

operand 

3.  TENPTR  :  Pointer  to  current  token  in  00  operand  tree 

6.  ET«*2— INTERSECT  :  Common  operands  between  DO  and  PS  operand  trees. 

7.  UNIQUE  :  Number  of  unique  operators /operands . 

8.  OCCURENCE  :  Total  number  of  operators /operands . 


Module:  SOFTKET 
Functional  Description: 

This  routine  produces  the  final  values  of  all  the  software  science 
metrics  for  Data  and  Procedure  divisions. 
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ODT  m 

1.  UNIQUE:  Number  of  unique  operators /operands  for 
Data  or  Procedure  division. 

2.  OCCURRENCE:  Total  number  of  operators /operands  for 
Data  or  Procedure  division. 

3.  N:  Program  length  J 

4.  ETA:  Vocabulary 

3.  NH:  Estimated  program  length 

6.  V:  Program  volume 

7.  LH:  Program  level  Software  Science 

8.  LAMDAH:  Language  level  Metrics 

9.  IN  TELL:  Intelligence  content 
10  EFFORT:  Programming  effort 


Module:  PRNTWID 
Functional  Description: 

This  module  generates  files  containing  the  frequency  distribution  of 
all  the  tokens  in  both  Data  and  Procedure  divisions. 

Subnrocesaea : 

1.  Traverse  the  operator/operand  tree  using  an  inorder  traversal  algorithm. 

2.  Construct  records  containing  the  frequency  distribution  of  the  tokens 


Interface: 


OUT  IH 

1.  1.  TOP:  Root  of  a  given  frequency  tree 

2.  2.  SYSUTO  :  File  containing  frequency  distributiona  of  token* 

in  Data  division. 

3.  3.  SYSUT1  :  File  containing  frequency  diatributiona  of  tokens 

in  Procedure  division. 

4.  RW-REC:  80-character  record  of  STSUTO  or  SYSUT1. 


Module:  TRAVERSE 


Functional  Description: 

This  recursive  subroutine  is  used  to  traverse  a  particular  frequency 
tree  and  produces  records  of  the  file  STSUTO  or  S7SUT1 

Suborocesa : 

1.  At  every  node  of  the  frequency  tree,  find  the  frequency  counts  of  he 
token  and  vrite  80-character  record  (NODE-  OUT). 

Interface: 


OUT  IN 

1.  TOP  :  Root  of  a  given  frequency  tree 

3.  PW-REC  :  80  character  record  of  STSUTO  or  S7SUT1 
2.  TREE-NODE  :  Node  unit  of  the  operator/operand  tree 

4.  STSUTO  :  File  containing  information  for  Data  division 

5.  STSUT1  :  File  containing  information  for  Procedure  division 


Si 


Modal* :  NODE-OUT 
Function*  1  Description: 

Given  a  frequency  tree  node,  thii  subroutine  finds  the  frequency 
count  of  the  token.  It  also  writes  80-character  record  of  SYSUT0/SYSUT1 . 

Interface: 


OUT  IN 

l.  TREE-NODE:  Node  unit  of  the  operator/  operand  tree. 
2.  PW-REC:  80-character  record  of  SYSOTO  or  SYSDT1 
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Module;  REPORT 
Functional  Description: 

This  nodule  produces  well  docuaented  output,  produced  by  the 
analyzer,  for  a  COBOL  prograa. 

Suborocessea : 

1.  Read  the  files  SYSUTO,  SYSUT1  and  SYSUT2. 

2.  Use  SYSUTO  to  produce  the  frequency  distribution  of  the  tokens 
aa  well  as  software  metrics  for  the  Data  division. 

2.  Use  SYSUT1  to  produce  the  frequency  distribution  of  the  tokens 
as  well  as  software  metrics  for  the  Procedure  division. 

3.  Use  SYSUT2  to  provide  the  software  metrics  for  the  entire  program 

4.  Print  all  the  information  of  steps  2,  3,  &  4. 

Interface: 


out  a 

1*  X.  STSUTO  :  All  the  information  for  DD 

2.  2.  SYSUT1  :  All  cha  information  for  PD 

3.  3TSUT2  :  Susaary  record  for  the  program 

5.  VIDXPLA  :  Output  line  of  133  character* 

6.  BUFFE&O  :  Buffer  containing  token  string*  of  DD 

7.  FKZQO  :  Frequency  counts  of  token*  in  DD 

8.  BUFFER!  :  Token  string*  of  PD 

9.  FREQ  1  :  Frequency  counts  of  tokens  in  PD 

4.  BOUT  :  Information  to  be  printed. 

10.  Final  Output  of  the  Analysis. 


Module:  Printl 
Functional  description; 

This  routine  prints  each  output  Line  of  the  final  report. 

Interface: 


OUT  or 

1.  BODY:  Information  to  be  printed. 


2.  WZDZPLA:  Output  line  of  133  characters 


Module  GSTOKO 

Ponce ion* 1  Description: 

Ie  read*  the  file  named  SYSUTO,  transfers  characters  from  the  input 
string  into  the  buffer  and  also  captures  the  frequency  counts  of  the 
corresponding  token  string. 

Interface: 


out  nr 

1.  SYSUTO:  File  containing  the  information  of  DD. 
S.  EAR:  Current  character  in  process. 

2.  BUFFERO :  Buffer  containing  the  token  string  of  DD. 

3.  FREQO:  Frequency  counts  of  the  token. 

4.  IHPUT-RECO:  Each  record  of  SYSUTO. 


Module:  GETXAEO 


85 


Functional  Description: 

Given  each  record  of  the  file  3TSUT0 ,  chi*  routine  captures  one 
cherecter  at  a  tine  until  che  end  of  record  is  encountered. 

Interface: 


OUT  IH 

1.  IHPUT-RKCO :  Each  80-character  record  of  3TSDT0 . 
2.  EAR:  Current  character  in  process- 


Module:  GET0K1 
Functional  Description: 

It  reads  the  file  named  SYSTJT1 ,  transfers  characters  from  the  input 
string  into  the  buffer  and  also  captures  the  frequency  count  of  the 
associated  token  string. 


OUT 


IN 


I.  SYSUT1:  File  containing  the  information  of  PD. 

5.  KAN:  Currant  character  in  procaaa. 

2.  BUFFER! :  Buffer  containing  the  token  string  of  PD. 

3.  FBEQ1:  Frequency  count  of  the  token. 

4.  INPUT-* SEC  1 :  Each  record  of  SYSUT1. 

Module:  GETKAB1 
Functional  Deecriotion: 

Given  each  record  of  the  file  SYSUT1,  this  routine  captures  one 
character  at  a  time  until  the  end  of  record  is  encountered. 

Interface: 


OUT  IN 

1.  INPUT-SEC 1 :  Each  80-character  record  of  STSUT1. 
2.  KA&:  Current  character  in  process. 


APPENDIX  -  B 
FILE  DESCRIPTIONS 

This  appendix  contains  the  operator  and  noisevord  file  listings  used 
when  the  analyzer  is  run  in  default  node.  Files  for  both  Data  and 
Procedure  divisions  are  included.  The  members  of  these  operator  and 
noisevord  files  (B.l  to  B.4)  are  used  to  construct  and  initialize  the 
fundamental  data  structures,  namely  the  operator  and  noise  trees,  used 
in  analyzing  a  COBOL  program.  Finally,  the  production  JCLs  to  run  the 
analyzer  are  shown. 
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8.1  :  DEFAULT  OPERATOR  FILE  FOR  DATA  DIVISION  (OPERO) 


1 . 

•  • 

2. 

FD 

3. 

LABEL 

4. 

BLOCK 

5. 

OMITTED 

6. 

DATA 

7. 

STANDARD 

8. 

VALUE 

9. 

LINAGE 

10. 

FOOTING 

11. 

TOP 

12. 

BOTTOM 

13  . 

CODE-SET 

14. 

RECORD 

15. 

RECORDING 

16. 

REDEFINES 

17. 

PICTURE  =  PIC 

18. 

USAGE 

19. 

SIGN 

20. 

LEADING 

21. 

SEPARATE 

23. 

OCCURS 

24. 

TO 

25. 

DEPENDING 

26. 

ASCENDING 

27. 

DESCENDING 

28. 

INDEXED 

29. 

SYNCHRONIZED  =*  SYNC 

30. 

JUSTIFIED  =*  JUST 

31. 

BLANK 

32. 

RENAME 

33. 

THRU 

34. 

TIMES 

35. 

DISPLAY 

36. 

COMPUTATIONAL  - 

COMP 

37. 

COMPUTATIONAL- 1 

=■  COMP-1 

38. 

COMPUTATIONAL- 2 

=*  COMP-2 

39. 

COMPUTATIONAL-3 

=  COMP-3 

40. 

LEFT 

41. 

RIGHT 
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DEFAULT  NOISEWORD  FILE  FOR  DATA  DIVISION  (NOISEG) 

ARE 

IS 

ON 

BY 

REPORT 

LINKAGE 

WORKING- STORAGE 
FILE 
CONTAINS 
MODE 

CHARACTERS 

CHARACTER 

LINES 

WITH 

AT 

KEY 

WHEN 

DIVISION 

SECTION 


B.3  :  DEFAULT  OPERATOR  FILE  FOR  PROCEDURE  DIVISION  (OPER1) 


1. 

* 

2. 

( 

3. 

♦ 

4. 

/ 

5. 

.MOVE 

6. 

.DIVIDE 

7. 

.REWIND 

8. 

.CLOSE 

9. 

.SORT 

10. 

•  EXIT 

11. 

.RELEASE 

12. 

.ACCEPT 

13. 

.SUBTRACT 

14. 

.COPY 

15. 

.SEARCH 

16. 

• COBACK 

17. 

•RESET 

18. 

.READ 

19. 

.ELSE 

20. 

.UNSTRING 

21. 

.ADD 

22. 

.WRITE 

23. 

.CANCEL 

24. 

.STOP 

25. 

.COMPUTE 

26. 

•  SET 

27. 

•DISPLAY 

28. 

.REWRITE 

29. 

.ENTER 

30. 

.RETURN 

31. 

.EXAMINE 

32. 

.READY 

33. 

.IF 

34. 

.OPEN 

35. 

.INSPECT 

36. 

.MULTIPLY 

37. 

AND 

38. 

WHEN 

39. 

FOR 

40. 

GIVING 

41. 

EXCEPTION 

42. 

NOT 

43. 

EXTEND 

44. 

FILE 

45. 

FIRST 

46. 

I— 0 

47. 

POSITIONING 

48. 

REEL 

49. 

REPACZ 

50. 

UP 

51. 

POINTER 

52. 

TALLYING 

53 .  DELIMITED 

54.  DEPENDING 

55.  DISP 

56.  LEADING 

57 .  BEFORE 

58 .  ERROR 

59.  TIMES 

60.  END 

61.  TRANSFORM 

62.  UNTIL 

63.  INTO 

64.  INVALID 

65.  REMAINDER 

66 .  BEGINNING 

67 .  COUNT 

68  EXHIBIT 

69  NEXT 

70.  UNIT 

71.  UPON 

72.  LOCK 

73 .  USE 

74.  USING 

75.  VARYING 

76.  OR 

77 .  REMOVAL 

78.  MERGE 

79.  DOWN 

80.  SUPPRESS 

81.  ENDING 

82.  ENTRY 

83 .  OUTPUT 

84.  AFTER 

85.  ALL 

86 .  REPLACING 

87 .  ROUNDED 

88 .  SEQUENCE 

89 .  START 

90 .  STRING 

91 .  INPUT 

92 .  CHANGED 

93 .  CHARACTERS 

94 .  NAKED 

95.  BY  <  REPLACE  <  UP  <  DOWN  <  DELIMITED 

96.  PROCEDURE  <  LABEL 

97.  FROM  >  CHARACTERS 

98.  CORRESPONDING  -  CORE 

99.  TO  <  EQUAL  <  PROCEED 

100.  ERROR  >  SIZE  >  OH 

101.  ON  <  DEPENDING 

102.  ELSE  -  OTHERWISE 

103.  TO  <  PROCEED 

104.  ON  <  OUTPUT  <  INPUT  <  1-0 

105.  DESCENDING  >  ON 

106.  GREATER  -  > 
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107.  EHD-OF-PAGE-EOP 

108.  EQUAL- 

109.  PROCEED  <  TO 

110.  LESS  -  < 

111.  PROCEDURE  >  OUTPUT  >  INPUT 

112.  ALL  >  SEARCH 

113.  OVERFLOW  >  OH 

114.  ASCENDING  >  OR 

115.  THROUGH  -  THRU 


Note  that  the  minus  sign  is  not  included  in  this  liet  since  it  is 
indistinguishable  in  form  from  a  hyphen.  The  analyser  resolves  this 
ambiguity  by  context.  Other  context  sensitive  operator  information 
appears  explcitly  in  the  action  at  Che  end  of  the  file. 


B.4  :  DEFAULT  NOISEWORD  FILE  FOR  PROCEDURE  DIVISION  (NOISE1) 


1.  ; 

2.  THAN 

3.  ) 

4.  THEN 

5.  , 

6 .  ADVANCING 

7.  AT 

8.  LINES 

9.  RUN 

10.  IS 

11.  SKIP1 

12.  SKIP2 
13  .  SKIP 3 
14.  EJECT 


B.5  :  File,  INITW,  which  generates  output  for  production  purposes  (see 
WIDJET  in  B.9). 

1.  MODE  -  'PRODUCTION' 

2.  GO 

3  .  PERFORM  * 

4.  CALL 

5.  ALTER 

6 .  NOTE 

7.  - 

8.  PROCEDURE 

9 .  PROGRAM- ID 

10 .  DATA 

11 .  $JOB 

12.  IDENTIFICATION 


B.6  :  File,  IN ITS.,  which  generates  output  for  debugging  purposes  (see 
OSUCNTPM). 

1  •  MODE-  'DEBUG' 

2.  GO 

3 .  perform 
4-  CALL 

5.  ALTER 
6*  NOTE 

7.  - 

8.  PROCEDURE 

9.  PROGRAM- ID 

10.  DATA 

11.  $J0B 

12.  IDENTIFICATION 

Note:  'PRODUCTION'  is  the  production  mode,  which  generates  the  regular 
output  (see  section  3.3).  On  the  other  hand,  'DEBUG'  is  the  debugging 
mode.  Its  output  consists  of  a  trace  of  input  stream  tokens  helpful  for 
debugging  purposes.  The  output  of  this  mode  is  different  from  that 
obtained  in  production  mode. 


B.7  :  PRODUCTION  JCL  (OSUCNTPM) 

I  *  //  EXEC  PGM-OSUCNTFM 

2*  //STEPLIB  DD  DSN-  TS06 1 8 .LQADLIB , DISP-SHR 
3.  //OPER  DD  DSN-TS06 1 8 . CSL IB ( 0PER1 ) , DISP-OLD , UNIT-USERDA 
4*  / / 0PER2  DD  DSN-TS0618.CSLIB(0PER0).DISP-OLD, UNIT-USERDA 

5*  //NOISE  DD  DSN-TS0618.CSLIB(N0ISE1), DISP-OLD, UNIT-USERDA 
6*  //NOISE2  DD  DSN-TSO618.CSLIB(N0ISE0), DISP-OLD, UNIT-USERDA 
7*  //IN IT  DD  DSN-TS0618.CSLIB(INITR), DISP-OLD, UNIT-USERDA 
8*  //SYSPRINT  DD  SYSOUT-A 
//WIDIPLA  DD  SYSOUT-A 
10  •  //WIDSUMO  DD  DUMMY 

II  •  //WIDSUM1  DD  DUMMY 
12*  //WIDSUM2  DD  DUMMY 

13.  /  /WIDFREQ  DD  DUMMY 

l4*  //SYSUTO  DD  DSN-TS0618.CIS2212.SYSUT0, UNIT-USERDA, DISP-OLD 

15.  // SYSUT1  DD  DSN-TS0618.CIS212.SYSUT1, UNIT-USERDA, DISP-OLD 

16.  //SYSUT2  DD  DSH-TS0618.CIS212.SYSUT2, UNIT-USERDA, DISP-OLD 

This  JCL  invokes  the  analyzer  and  generates  hard  copy  of  the  analysis 
report,  but  does  not  store  the  analyzer  output  into  a  separate  disk 
file  for  future  use. 


B.8  :  JCL  FOR  INVOKING  WIDJET  (INTERNAL) 


1.  //  EXEC  PGM=IEBG£NER 

2.  //SYSUT2  DD  SYSOUT=  ( Af  INTRDR )  t IiCB=  <  RECFM=FBf LRECL=80  »  BLKSIZE=00  ) 

3.  //SY3PRINT  DD  SYSOUT=A 

4.  //SYSIN  DD  DUMMY 

5.  //SYSUTl  DD  DSN=TS0618.UIDJETfDISP=0LDfUNIT=USERDA 

6.  //  DD  DDNAME-PROGRAM 


This  JCL  file  called  INTERNAL  is  used  to  invoke  the  production  JCL 
required  to  run  the  analyzer  through  the  on-line  terminal.  In 
particular,  WIDJET  or  WYLBUR  users  invoke  INTERNAL,  which  in  turn 
invokes  the  JCL  file  called  WIDJET  through  the  internal  reader. 


B.9  :  PRODUCTION  JCL  (WIDJET) 


O.Ol 
0.02 
0.03 
0,04 
0.05 
0.09 
0.14 
0.15 
0.16 
0.17 
0.  18 
0.21 
0.22 
0.23 
0.24 
0.241 
0.242 
0.25 
0.26 
0.27 
0.28 


//COUNTPGM  JOB  'FRB080»212313080' f 'ANALYZER*  C. 

/* JOBPARM  V— D 
//  EXEC  PGM=OSUCNTPM 

//STEPLiB  DD  DSN=TS0618.L0ADLIBfDISP=SHR 
//SYSPRINT  DD  SYSQUT=A 
//WIDIPLA  DD  SYSOUT=A 

//OPER  DD  DSN=TS0618 .CSLIB  < 0PER1 > » DISP=OLD » UNIT =USERDA 
//0PER2  DD  DSN=TS0618 . CSLIB  <  OPERO ) f D ISP=OLD »  UNIT=USERDA 
//NOISE  DD  DSN=TS0618 . CSLIB <  NOISE  1 ) r DISP=QLD  f UNIT  =USERDA 
//N0ISE2  DD  DSN=TS0618 . CSLIB ( NOI SEO ) r DISP=OLD * UNIT  =  USERDA 


//INIT  DD  DSN=TS0618 . CSLIB ( INITW ) »  DI SP=0LDfUNI T  =USERDA 
//WIDSUMO  DD  DUMMY 
//WIDSUM1  DD  DUMMY 
//UIDSUM2  DD  DUMMY 

//UIDFREQ  DD  DSN=rS0618 . CIS212 . FREQ  f  UNIT=USERDA  f  DISP= ( MOD . CATLG ) 
//  DCB=<RECFM=FBfLRECL=80fBLNSIZE=S0fDS0RG=PS> f 
//  SPACE=  <  TRK r  <50»20> ) 

//SYSUTO  DD  BSN-TS0618 . CIS212 , SYSUTO  f  UNIT=USERDA  f DISP=OLD 
//SYSUTl  DD  DSN=TS0618. CIS212. SYSUTl fUNIT=USERDAfDISP=OLD 
//SYSUT2  DD  DSN=TS0618 . CIS212 . SYSUT2  f  UNIT=USERDA  f  D ISP*OLD 
//SOURCE  DD  * 
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This  JCL  file  called  WIDJET  is  used  to  run  the  analyzer  and  to  collect 
Software  Science  data  from  student  COBOL  programs  at  Ohio  State 
University.  One  should  note  the  file  'FIDF&EQ'  and  its  parameters. 

In  particular,  DSN  ■  TS0618 .CIS212 .FREQ  gives  the  file  name  on  the 
disk  where  all  the  data  from  the  analyzer  is  to  be  collected,  and  the 
SPACE  parameters  provides  sthe  primary  and  secondary  storage 
allocation  for  this  disk  file.  In  the  present  case,  the  primary 
storage  allocation  for  TS0618.cis212.  would  be  50  tracks  and  the 
secondary  allocation  is  20  tracks  (up  to  16  extent). 


10;  THE  EXEC  PROGRAM  INVOKED  BY  THE  "ANALYZE"  COMMAND 


SET  EXEC  NOLOG  TERSE 
ON  ERROR 
ON  ATTN 

SET  VALUE  SO  LAST 
IF  (  SO  EQ  O.OOO)  EXEC  1L.01 

COMMENT  ************************************************ 

COMMENT  THE  ACTIVE  FILE  MUST  BE  EMPTY  TO  RUN  THE 
COMMENT  ANALYZER.  TRY  AGAIN  AFTER  SAVING  THE  FILE  IF 
COMMENT  NEEDED  AND  ALSO  CLEARING  THE  ACTIVE  FILE. 

COMMENT  ************************************************ 

EXEC  54 
COMMENT 

COMMENT  WHICH  COURSE  IS  THIS (212  OR  313 >? 

READ  STRING  SO 
IF (SO  EG  '212')  EXEC  12 
IF ( SO  EQ  '313')  EXEC  61 
EXEC  11.01 
CLEAR  ACTIVE 
SET  ESCAPE  1 

COMMENT  WHICH  OF  THE  FOLLOWING  LAB  NUMBERS  DO  YOU  WANT  ANALYZED? 
COMMENT 

COMMENT  02  03  04  05  06 

COMMENT 

READ  STRING  SI 
IF (SI  EQ  '02')  EXEC  28 
IF  <  SI  EQ  '03' )  EXEC  28 
IF (SI  EQ  '04')  EXEC  28 
IF ( SI  EQ  '03')  EXEC  28 
IF  (SI  EQ  '06')  EXEC  28 
COMMENT 

COMMENT  ********INVALID  LAB  NUMBER******** 

COMMENT 

COMMENT  PLEASE  RE-ENTER  THE  LAB  NUMBER  AGAIN. 

EXEC  14 

SET  VALUE  S2  SUBSTR ( GROUP . 3. 1 ) M USER 
COMMENT  WHAT  IS  YOUR  LAST  NAME? 

READ  STRING  S4 

COMMENT  UHAT  IS  THE  NAME  OF  YOUR  LAB  FILE? 

READ  STRING  S3 

IF  (  VERIFY (S3» ' ♦' )  EQ  2  )  COPY  FROM  *S3  TO  10  BY  10 
IF  (  VERIFY ( S3 » ' ♦ ' )  EQ  1  )  COPY  FROM  **S3  TO  10  BY  10 
POINT  '  JOB  ' 

IF  (CURRENT  LT  O)  EXEC  40 
SET  VALUE  Ul=* 

POINT  ' *JOB' ' 

SET -VALUE  W2«* 

DEL  1W1/JW2 
POINT  '//  ' 

IF  (CURRENT  GT  0)  EXEC  44 
SET  VALUE  W3»* 

99999.99  // 

0.001  //  JOB 
0.002  /*JOBPARM  V-D 

0.003  //PROCLIB  DO  DSN-TS0618.PR0CLIB 
0.004  //  EXEC  INTERNAL 

0.005  //PROGRAM  DO  * 

0.006  *JOB  1S11S2  1S4 

RUN 

COMMENT  LAB  1S1  HAS  BEEN  ANALYZED  UNDER  THE  TC1S2  NUMBER. 

SET  ESCAPE  " 

CLEAR  ACTIVE 
CLEAR  EXEC 
CLEAR  ACTIVE 
SET  ESCAPE  ( 

COMMENT  WHICH  OF  THE  FOLLOWING  LAB  NUMBERS  DO  YOU  WANT  ANALYZED? 
COMMENT 

COMMENT  LI  L2  L3 

COMMENT 

READ  STRING  SI 
IF ( SI  EQ  'Ll' )  EXEC  28 
IF ( SI  EQ  'L2'>  EXEC  28 
IF ( SI  EQ  ' L3 ' )  EXEC  28 
COMMENT 

COMMENT  ********  INVALID  LAB  NUMBER******** 

COMMENT 

COMMENT  PLEASE  RE-ENTER  THE  LAB  NUMBER  AGAIN. 

EXEC  63 


B. 1 1 :  SAMPLE  HANDOUT 
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CIS  212 

Instructions  to  run  the  Analyzer 


Run  your  final  version  of  Che  program  to  gee  Che  printout  for  submission.  AFTER  YOU  GET 
THE  FINAL  AND  CORRECT  OUTPUT,  USE  THE  FOLLOWING  STEPS  TO  RUN  THE  ANALYZER. 

1.  Clear  your  active  file  by  using  the  Command:  CLEAR  ACTIVE 

2.  Type  the  following  command: 

Command?  ANALYZE 

After  you  have  typed  the  ANALYZE  command  and  pressed  the  RETURN  key,  you  will  be  asked 
the  following  questions.  Simply  insert  Che  answer  to  these  questions. 


Questions  would  be  asked 


Answers  to  be  submitted 


a.  What  course  is  this?  (212  or  313) 


a.  Enter  212 


b.  Which  of  the  following  lab  numbers 
do  you  want  analyzed? 

02  03  04  05  06 


b. 


If  you  want 
If  you  want 
If  you  want 
If  you  want 
If  you  want 


to  analyze 
to  analyze 
to  analyze 
to  analyze 
to  analyze 


Lab  2, 
Lab  3, 
Lab4, 
Lab  5, 
Lab6, 


Type  02 
Type  03 
Type  04 
Type  05 
Type  06 


c.  What  is  your  Last  Name?  c.  Enter  your  Last  Name 

d.  What  is  the  name  of  your  Lab  file?  d.  Enter  the  actual  file  name  chat  you 

have  used  to  save  your  Lab. 

(For  example,  if  you  are  analyzing 
Lab2,  and  the  actual  file  name  for 
Lab2  is  FR0G2  then  enter  PR0G2) . 


After  you  have  answered  these  questions,  wait  till  you  get  the  following  message: 

LAB  (02  or  03  or  04  or  05  or  06)  HAS  BEEN  ANALYZED  UNDER  THE  TRnnnn  NUMBER 

Where  nnnn  is  the  4-digit  number  of  your  personal  user-id.  As  soon  as  you  get  this  message, 
you  are  done  with  the  analyzer. 

If  you  have  any  question,  comment  or  difficulty,  please  see  your  instructor  or  Mr.  DEBNATH 
in  CL  418. 
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APPENDIX  -  C 


Maintenance  Procedure 


The  Software  Science  COBOL  Analyzer  developed  at  OSU  is  a  significant 
component  of  the  Software  Metrics  Research  Group  in  the  Department  of 
Computer  and  Information  Science.  The  Analyzer  is  used  to  collect  data 
from  the  students  of  undergraduate  COBOL  classes  (e.g.  CXS212,  CIS313), 
needed  to  pursue  further  research  in  Software  Science.  In  addition  to  the 
students'  programs,  various  kinds  of  COBOL  programs  are  also  collected  for 
analysis  from  the  University  Systems  Computer  Center  as  well  as  from  other 
organizations.  Therefore,  the  person  responsible  for  maintaining  the 
Analyzer  has  direct  interaction  with  many  different  groups  of  people.  The 
present  section  provides  a  few  of  the  major  steps  to  be  followed  for 
maintenance  of  the  Analyzer.  Particular  attention  is  given  to  its 
interface  with  undergraduate  COBOL  classes. 
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Procedure : 

1.  It  is  important  to  become  familiar  with  the  working  of  the  analyzer 
program  as  well  as  to  know  how  to  use  the  analyzer  in  the  OSD  environment. 

2.  The  handout  containing  the  necessary  instructions  to  run  a  job  through 
the  analyzer  should  be  provided  to  each  student  at  the  beginning  of  the 
quarter  (see  sample  handout  in  Appendix-B) 

3.  CREATE  TEE  DISK  SPACE 

Before  the  students  start  running  their  jobs  through  the  analyzer, 
space  should  be  created  on  the  disk  in  order  to  collect  the  students' 
output  from  the  analyzer.  Currently  the  disk  file  called  'CIS212.FREQ'  is 
used  for  this  purpose.  The  disk  space  for  the  file  'CIS212.FREQ'  is 
created  according  to  the  SPACE  parameter  used  in  the  JCL  file  called  WIDJET 
(see  Appendix-B);  e.g. 

SPACE  -  (TRK,  (50,20))  ,  DISP  -  (MOD,  CATLG) 

Thus,  the  size  of  this  disk  file  can  be  changed  simply  by  changing  the 
SPACE  parameter  as  desired. 

4.  ARCHIVE  THE  DATA  ON  THE  DISK 

Disk  space  is  very  expensive  and  the  collected  data  on  the  disk  is 
not  usually  used  during  the  quarter.  Therefore,  these  data  must  be 
archived  to  tape  using  the  ASM2  commands  [IRCC  ASC2  Manual]  quite 
frequently.  It  is  strongly  suggested  that  every  50-100  tracks  of  data 
should  be  archived  to  tape.  The  archive  commands,  $AR,  provides  the  user 
with  the  ability  to  archive  the  specified  data  set(s)  to  tape. 

Example  :  $AR  'DSLIST' 

"DSLIST'  is  the  list  of  input  data  set  names 
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5.  ASSIGN  APPROPRIATE  NAME  AND  MAXIMUM  RETENTION  PERIOD  FOR  THE  FILE  TO  BE 
ARCHIVED 

While  archiving  Che  data  from  Che  disk,  the  file  should  be  assigned  a 
name  which  is  different  from  the  original  disk  file  name  and  from  any 
previously  archived  file  name.  The  new  file  name  should  reflect  the  name 
of  the  quarter  when  the  data  was  collected.  This  helps  recognize  the  file 
easily  when  using  it  after  a  long  period  of  time.  One  should  also  specify 
the  maximum  retention  period  (e.g.  365  days)  for  the  file  being  archived  to 
tape. 

The  archive  command  [asm2  Manual]  to  be  used  in  order  to  satisfy  the 
requirements  of  step4  and  5  has  the  following  syntax. 

$AR  'DSLIST'  RETPD  ('INTEGER')  QUAL  ('QUALIFIER') 

Operands : 

'DSLIST'  -  List  of  input  data  set  names. 

RETPD  ('INTEGER') 

-  Specifies  the  desired  retention  period  on  tape 

-  'INTEGER'  -  a  1  to  3  digit  value  which  defines  the  number  of  days  the 
user  wishes  to  have  the  specified  data  set  retained  on  archive  tape. 

QUAL  ('QUALIFIER') 

-  causes  the  data  set  to  be  renamed  when  archived 

'QUALIFIER'  -  a  1  to  8  character  string  in  which  the  first  character  must 
not  be  numeric. 


Example: 

$AR  CIS212.FREQ  RETPD  (365)  QUAL(FALL1981 ) 


I  l ;  J 

This  example  illustrates  that  the  disk  file  named  CIS212.FREQ  would  be 
archived  to  tape  with  a  new  name  CIS212.  FALL1 981 .FREQ,  and  that  a  365  day 
retention  period  would  be  in  effect. 

6.  RETRIEVE  THE  ARCHIVED  DATA  SET  AND  PRODUCE  QUARTERLY  REPORT. 

At  the  end  of  the  quarter,  the  data  collected  for  the  entire  quarter 
should  be  restored  on  the  disk  to  produce  two  different  quarterly  reports. 

The  first  report  provides  information  concerning  which  students  have  run 
which  course  labs  through  the  analyzer.  This  report  can  be  provided  to  the 
instructors  of  the  course  in  case  course  grades  are  influenced  by  the 
students'  use  of  the  analyzer.  The  second  report  summarizes  the  Software 
Science  metrics  collected  for  all  the  programs  during  the  quarter,  and  is 
used  by  the  software  metrics  research  group. 

Both  reports  require  some  sorting  of  the  data.  The  procedure  for 
producing  the  reports  is  described  below. 

(a)  The  sort  keys  for  the  first  report  is  as  follows 
Keyl  :  Students  account  number 
Key2  :  Lab  number. 

It  is  desirable  that  each  page  of  this  report  contain  the  account  number  of 
a  particular  student  and  the  lab  numbers  which  were  run  by  this  student 
during  the  whole  quarter.  This  format  makes  it  easier  for  the  course 
instructors  to  use  this  information  for  grading  purposes.  Presently,  a 
program  called  "BONUSREP"  generates  this  report.  This  report  should  be 
distributed  to  all  the  course  instructors  before  the  final  examination 
begins. 


(b)  The  second  report  should  be  sorted  according  to  lab  numbers  so 
that  all  the  data  for  each  lab  appear  together.  This  makes  the 
analysis  convenient.  Currently,  a  program  named  "REP"  is  used  to 
produce  this  quarterly  report. 

The  ASH2  command  $RA  (Reload  from  Archives)  can  be  used  to  restore  an 
archived  data  from  an  archived  tape  to  an  on-line  disk  pack,  and  has  the 
following  syntax. 

$RA  'DSLIST' 

Example: 

$RA  CIS212.FALL19S1 .FREQ 

The  use  of  this  reload  command  will  allow  the  desired  data  sets  to  be 
reloaded  into  the  disk.  After  transfering  all  the  required  data  to  the 
disk,  the  programs  "REPBONUS"  and  "REP"  can  be  run  on  these  data  for 
producing  the  above  reports. 


The  following  wylbur  commands  are  used  to  run  "REP"  and  "REPBONUS". 


Command?  SET  GROUP  TS1 
Command?  SET  USER  483 

Command?  USE  REP  (or  REPBONUS)  ON  CATLG 


Command?  RUN 


7.  FREE  UP  THE  UNNECESSARY  DISK  SPACE. 


After  producing  the  quarterly  reports,  all  disk  spaces  must  be  freed 
as  quickly  as  possible,  either  by  scratching  all  the  data  sets  or  by 
archiving  these  back  to  tape  depending  on  whether  a  copy  of  the  data  sets 
already  exist  on  tape  or  not. 

The  following  Wylbur  command  is  used  to  scratch  a  data  set  from  the  disk. 

Command?  SCRATCH  (Data  set  name) 

The  same  $AR  command,  as  discussed  above,  is  used  to  archive  the  data  set 
to  tape. 

Note  that  maintenance  of  the  analyzer  involves  handling  a  large  number 
of  important  files  existing  on  disk  as  well  as  on  tape.  In  order  to 
maintain  these  files  efficiently,  one  should  remember  that  only  those  files 
which  are  used  very  often  should  be  kept  on  disk;  other  files  should  be 
saved  on  tape.  The  retention  period  of  all  files  on  tape  MUST  be  checked 
periodically  by  using  the  $AI  (Archive  Catalog  Information)  Command  [ASM2 
Manual],  and  the  expiration  date  must  be  updated  if  necessry.  Another 
possibility  is  that  the  data  can  be  saved  on  a  personal  tape.  Otherwise, 
it  is  possible  that  important  data  will  be  lost. 

Finally,  the  programs  collected  from  sources  other  than  OSU  courses 
should  be  run  through  the  analyzer  using  the  necessary  JCL,  and  the  output 
should  be  saved  on  separate  files,  if  possible.  The  present  JCL  called 
"JCLTAFE"  is  used  to  run  the  programs  that  are  collected  from  the 
University  Systems  Computer  Center  and  from  other  organizations.  The 
output  from  the  analyzer  is  collcted  on  the  separate  disk  file  called 
"US.FREQ".  Since  the  source  programs  for  these  analyses  exist  on  a 
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personal  tape,  it  is  possible  to  scratch  the  disk  file  containing  the 
results  of  the  analyses  after  producing  the  final  report  (report  #2 
described  earlier  in  this  section).  HO  archiving  of  the  analysis  data  is 
necessary  for  these  prograsu. 


APPENDIX  -  D 


INVOKING  THE  PDSDDE  ANALYZER  AT  IRCC 


D.l:  JCL  REQUIRED  TO  RUN  THE  PURDUE  ANALYZER 


For  comparison  purposes  a  COBOL  analyzer  written  at  Purdue  University  can 
be  run  at  the  IRCC  of  Ohio  State  University  using  the  following  JCL. 

//JOB 

// REGION-5 12K 

/♦JOBPARM  LINES-7000 .DISKI0-5000 
//SA  EXEC  COBSORT 
//COB. SYS IN  DD* 


(TS1483 .COBOL. KEYWORD. FILE  file) 


/* 

//GO.SORTFIL  DD  UNIT-SYSVIO, SPACE  -  <CYL,(1,1)> 
//GO.KEYWD  DD  DSN-TS1483 .KEYWORDS. DATA, 

UNIT-USERDA,DISP-(NEV, CATLG) , 
SPACE-(CYL, (1,1)) , 

DCB-C  RECFM-FB , LRECL-30 , BLKS IZ  E-300 ) 
//SECOND  EXEC  COBUCG,  TIME-10 
//COB.SYSIN  DD  * 


(TS 1483. PURDUE. COBOL. ANALYZER  file) 


/* 


//GO.SYSDJ  DD 


(COBOL  program) 


/* 

//GO.KEYWD  DD  DSN-TS1483 .KEYWORDS. DATA, UNIT-USERDA,  DISP-SHE 
//GO. OUTPUT  DD  SYSOUT-A 

//  DCB-(R£CFM-FBA,LR£CL-133 .BLKSIZE-133) 

/ /GO. SUMFILE  DD  SYSOUT-A, 

//  DCB-(RECFM-FBA,LRECL-133 .BLKSIZE-133) 


The  Purdue  Analyzer  consists  of  two  separate  programs,  namely,  SORT- TAB  and 
ANALYZE.  Both  of  these  files  have  been  compiled  with  necessary 
modifications  using  the  standard  COBOL  compiler  at  Ohio  State  University, 
and  renamed  as  TS1483.COBOL.OTWORD.FILE  and  TS 1483. PURDUE. COBOL. ANALYZER, 
respectively.  The  first  program  generates  a  file  of  predefined  COBOL 
keywords  and  noisewords,  while  the  second  program  analyzes  any  COBOL  source 
program  given  as  input  using  the  counting  strategy  defined  by  the  first 
program. 

One  should  note  that  the  COBOL  program  to  be  analyzed  should  include 
both  the  Data  and  Procedure  divisions.  Separate  analyses  vill  be  done  for 
the  two  divisions,  using  the  predefined  counting  strategy  developed  at 
Purdue  University. 
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D.2:  OUTPUT  OF  THE  PURDUE  ANALYZER 

The  output  of  the  Purdue  Analyzer  consists  of  the  listing  of  the 
input  file  followed  by  the  statistics  for  the  entire  program.  Statistics 
include  the  list  of  operators  and  operands  along  with  their  corresponding 
frequency  counts  for  both  the  Data  and  Procedure  divisions.  The  summary  of 
the  basic  software  metrics  values,  namely,  ETA1 ,  N1 ,  ETA2 ,  N2,  N,  NHAT  and 
REL  ERHOR  are  also  produced.  A  sample  output  generated  by  the  Purdue 
analyzer  is  included  for  completeness. 


STATISTICS  FOR  THIS  MODULE 
DATA  DIVISION 


FREQUENCY 

97 

4 

1 

4 


OPERATOR 


ASSIGN 

BLOCK 

FD 


ETA1- 
Nl  = 


OPERAND 


FREQUENCY 


PR  INTER-F ILE1  2 

PR INTER-F ILE2  2 

X  (8)  4 

CURR-DATE  2 

0  13 

PR-MM  1 

9  10 

»  • 


ETA2* 
N2  - 


PROCEDURE  DIVISION 


ITERATOR 

ACCEPT 

ADD 

CLOSE 

MOVE 

OPEN 


FREQUENCY 

1 

16 

1 

76 

1 


PERFORM  EDIT-CHECK  1 

PERFORM  HFADER-PRl  1 


ETA1  = 
N 1 


OPERAND 


FREQUENCY 


PRINTER- FI  LEI  2 

PRINTER- FI LE2  2 

31  1 

'  *'  1 

CURRENT- DATE  1 

O.l'RK-DATE  2 


I.TA2  - 
N2 


ETA  2  = 
N2  = 


DATA 

PROCEDURE 

MODULE 

ETA  1 

8 

31 

39 

N 1 

238 

269 

507 

ETA  2 

74 

84 

94 

N2 

159 

358 

517 

N 

397 

627 

1024 

NHAT 

483 

690 

822 

EEL  ERROR/.  (N) 

0.1780 

0.0913 

0.2457 

D.3:  COMPARISON  BETWEEN  OSU  AND  PURDUE  ANALYZER 


The  following  differences  have  been  observed  between  the  outputs  of  the 
two  analyzers. 


PURDUE 

DD  Operator:  ASSIGN  and  SELECT 
are  treated  as  DD 
operators . 

LABEL,  RECORD  and 
STANDARD  have  been 
skipped. 

DD  Operands:  FILLER  and  labels 
are  not  counted; 

Some  x(n)'s  in  PIC's 
are  not  counted  also. 

PD  Operators:  Operators  such  as 

TO,  INTO,  NEXT,  AFTER 
FROM,  INPUT,  OUTPUT, 
CORRESPONDING,  EOS  are 
skipped  altogether 
from  the  operator 
operand  list. 

NUMERIC  is  treated 
as  an  operator. 

EQUAL,  NOT  EQUAL  are 
treated  a6  separate 
operators.  Similarly 
GREATER,  NOT  GREATER 
etc. 

Instead  of  End  of  state¬ 
ments  (EOS),  Purdue 
counts  required  periods. 


OSU 

ASSIGN  and  SELECT 
are  not  taken 
into  consideration. 
Instead  LABEL, 
STANDARD  are  treated 
as  DD  operators. 

FILLER,  labels  and 
all  z(n)s  are 
counted. 


all  the  standard 
operators  are 
counted;  major 
differences  come 
from  EOS  is  TO 
counts.  EOS  is  the 
number  of  verba. 

NUMERIC  is  treated 
as  an  operand. 

NOT  and  EQUAL  are 
separated  and 
treated  as  two 
different  operators. 


PD  Operand  counting  is  very  comparable,  except  that  in  the  OSU  analyzer 
there  are  some  additional  operands  e.g.  SENTENCE,  OF. 


i :  l 


It  is  noted  that  (1)  the  main  difference  in  counts  come  from  DD  operands 
and  PD  operators.  (2)  The  Purdue  analyzer  does  not  count  PD  operators  like 
TO,  FROM  (which  are  used  with  arithmetic  verbs  and  sometimes  are  not 
optional  e.g.  in  Move  A  TO  B,  TO  is  not  optional  but  in  A  EQUAL  TO  B,  TO  is 
optional;  there  are  many  such  examples).  OSD  counts  all  these  required 
operators. 
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